TWI391834B - Systems for and methods of finding relevant documents by analyzing tags - Google Patents

Systems for and methods of finding relevant documents by analyzing tags Download PDF

Info

Publication number
TWI391834B
TWI391834B TW95128551A TW95128551A TWI391834B TW I391834 B TWI391834 B TW I391834B TW 95128551 A TW95128551 A TW 95128551A TW 95128551 A TW95128551 A TW 95128551A TW I391834 B TWI391834 B TW I391834B
Authority
TW
Taiwan
Prior art keywords
tag
target
user
targets
tags
Prior art date
Application number
TW95128551A
Other languages
Chinese (zh)
Other versions
TW200715152A (en
Inventor
Lu Yunshan
Tanne Michael
Original Assignee
Search Engine Technologies Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Search Engine Technologies Llc filed Critical Search Engine Technologies Llc
Publication of TW200715152A publication Critical patent/TW200715152A/en
Application granted granted Critical
Publication of TWI391834B publication Critical patent/TWI391834B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

藉分析標籤尋找相關文件之系統及方法System and method for finding relevant documents by analyzing tags 【相關申請】[related application]

本申請案係根據35 U.S.C.§ 119(e)而主張共同提出申請之美國暫時專利申請號60/705,704、申請日2005年8月3日、標題“利用標籤分析而尋找相關文件之技術”之申請案的優先權,而其易於此併入做為參考。The present application is based on US Provisional Patent Application No. 60/705,704, filed on Apr. 3, 2005, and the title of "Technology for Searching for Relevant Documents Using Label Analysis", in accordance with 35 USC § 119(e) The priority of the case, which is easy to incorporate as a reference.

此發明係相關於文件的搜尋,更特別地是,本發明係相關於在網際網路上藉由分析使用者所產生之標籤而搜尋資訊的系統及方法,以藉此改善搜尋結果的品質、或相關性。The invention relates to the search of documents, and more particularly to the system and method for searching for information on the Internet by analyzing tags generated by users, thereby improving the quality of search results, or Correlation.

此發明係相關於文件的搜尋,更特別地是,本發明係相關於在網際網路上藉由分析使用者所產生之標籤而搜尋資訊的系統及方法,以藉此改善搜尋結果的品質、或相關性。The invention relates to the search of documents, and more particularly to the system and method for searching for information on the Internet by analyzing tags generated by users, thereby improving the quality of search results, or Correlation.

網際網路搜尋引擎係加以設計為會自網際網路所包含之遍及各處的浩瀚資訊中定位出所需的資訊。首先,使用者藉由輸入包含搜尋項目的要求而敘述其所想要尋找的資訊,而該搜尋引擎則是會利用關聯於用以辨識最有可能與使用所尋找之資訊的該些網頁產生相關之目標的計算,而將該搜尋項目與網頁的索引進行比對,接著,該搜尋引擎就會回覆指向這些網頁之超連結的排序列表,且其中,被認為最為相關之網頁的連結則會被列在該列表越接近上方的位置。The Internet search engine is designed to locate the information you need from the vast expanse of information that is included throughout the Internet. First, the user describes the information he or she is looking for by entering a request to include the search item, and the search engine is associated with the web pages that are used to identify the most likely to use the information sought. The calculation of the target, and the search item is compared with the index of the webpage, and then the search engine will reply to the sorted list of hyperlinks pointing to the webpages, and the links of the webpages considered to be the most relevant will be Listed closer to the top of the list.

搜尋引擎的目標乃是在於對一提出的要求遞送最為相關的網頁。搜尋引擎係利用各式各樣的技術而決定網頁的關聯性,舉例而言,藉由考慮包含在每一個頁面中的資訊,例如,在該份文件中之該等搜尋項目的存在、密度、以及接近度,藉由考慮相關於該等網頁間之超連結的資訊,或是使用者的行為,例如,點選、瀏覽、或評價結果或網頁。而為了達成最好的結果,這些技術則是可以分開地、或是以各種組合而加以應用。The goal of the search engine is to deliver the most relevant pages to a request. Search engines use a variety of techniques to determine the relevance of a web page, for example, by considering the information contained in each page, for example, the existence, density, of such search items in the document. And proximity, by considering information about hyperlinks between such web pages, or by user behavior, such as clicking, browsing, or evaluating results or web pages. In order to achieve the best results, these techniques can be applied separately or in various combinations.

由於在網際網路上之網頁的數量非常的大、且係不斷地在成長,再加上,常常會有相當大量在表面上滿足使用者之要求的網頁,因此,決定哪些網頁是最具關聯性的程序相當的困難,同樣地,也由於大多數的使用者並不熟練於產生及輸入架構良好的要求,所以,就有可能會造成其所欲尋找之資訊的型態變得意義不明確的情形發生,因此,藉由進行在該等要求中之文字以及在該等文件中之文字間的比較而決定哪些文件是最相關聯於該要求,所能提供之結果的正確性相當有限。Since the number of web pages on the Internet is very large and growing, plus a large number of web pages that meet the requirements of users on the surface, it is decided which pages are most relevant. The procedure is quite difficult. Similarly, since most users are not proficient in the requirements of the generation and input architecture, it is possible that the type of information they are looking for becomes meaningless. The situation occurs, and therefore, by making a comparison between the words in the requirements and the text in the documents to determine which documents are most relevant to the request, the correctness of the results that can be provided is rather limited.

當使用者瀏覽、或搜尋網際網路時,其係可以將各種的目標,例如,網頁、影像、話題、網路日誌(Weblog)(亦稱為網誌(blog))、或是其他的目標,藉由記錄對於該目標的一參考而將其加入“書籤(bookmark)”,其中,這些書籤係可以包含一、或多的“標籤(tag)”,而該標籤的構成則為,使用者關聯於該目標的一、或多個項目,對於該目標的一超連結(全球資源定位器(uniform resource locator)、或“URL”),一用於記錄該關係的機制,以及潛在的其他資訊,並且,這些書籤乃會幫助使用者再呼叫該目標以及任何的標籤,以協助對該被加入書籤之目標有關的其他目標進行再呼叫、或產生連通,舉例而言,若是一使用者參觀了一敘述有關用於屋頂之太陽能板的網頁時,其係可以將其加入書籤,並利用項目“太陽能”而產生與該網頁相關的一標籤,此外,其亦可以利用該項目“太陽能”而讓有關國家太陽能補助計畫(State solar power rebate program)的另一個網頁與該相同之標籤產生關聯,所以,結果會是,該具有項目“太陽能”的標籤會與兩個網頁都產生關聯。When users browse or search the Internet, they can target various purposes, such as web pages, images, topics, weblogs (also known as blogs), or other targets. , by adding a reference to the target, adding it to a "bookmark", wherein the bookmarks may contain one or more "tags", and the label is composed of users One or more items associated with the target, a hyperlink to the target (uniform resource locator, or "URL"), a mechanism for recording the relationship, and potentially other information And, these bookmarks will help the user to call the target and any tags to assist in re-calling or generating connectivity to other targets related to the bookmarked target, for example, if a user visits When describing a web page relating to a solar panel for a roof, it can bookmark it and use the item "solar" to generate a label associated with the web page. In addition, it can also utilize the Another page of the state solar power rebate program is associated with the same label, so the result will be that the label with the item "solar" will be linked to two pages. All have an association.

使用者輸入標籤的方法有很多種,舉例而言,利用一伺服器應用程式,在書籤工具列中的小應用程式(applet),一瀏覽器外掛程式(plug-in)或延伸程式(extension),一客戶端應用程式(client application),或是一些其他的應用程式,其中,一旦輸入標籤之後,則通常就會允許使用者對這些標籤進行搜尋,以顯示相關於該等標籤的該些網頁,並且,到現在,已經提供有允許使用者搜尋他們自己所用有之標籤、或是搜尋其他人之標籤的服務。There are many ways for users to enter tags. For example, use a server application, an applet in the bookmarks toolbar, a browser plug-in or an extension. , a client application, or some other application, wherein once the tag is entered, the user is typically allowed to search for the tags to display the pages associated with the tags And, by now, services have been provided that allow users to search for tags that they use themselves, or to search for tags for others.

書籤係提供了一使用者珍惜一目標,例如,一網頁,的某種暗示,以及標籤則是額外地提供了一使用者使該目標與某一、或某些項目產生關聯的某種暗示,而此資訊則是在決定網頁是否應該要如來自一搜尋引擎之一要求結果一樣地進行顯示時,具有潛在地價值,因為其不但暗示了真實使用者對於該個網頁的興趣,也暗示了其對於一特定主題的聯想。Bookmarks provide a hint that a user cherishes a goal, such as a webpage, and the tag additionally provides a hint that the user associates the goal with a certain item, or certain items. And this information is potentially valuable in deciding whether a web page should be displayed as if it were from one of the search engines, because it not only implies the real user’s interest in the web page, but also implies For a particular topic of association.

因此,的確需要一搜尋引擎,其會在決定哪些網頁、影像、網誌、或是其他目標是相關於使用者之要求時,對相關於各式網頁、影像、網誌、或是其他目標的標籤進行考慮。Therefore, there is a need for a search engine that will be relevant to various web pages, images, blogs, or other purposes when deciding which pages, images, blogs, or other goals are relevant to the user. The label is considered.

本發明的實施例係提供使用者一目標列表(該結果列表),以回應搜尋要求,其中,該結果列表乃是以每一個目標對該要求的一相關性作為基礎所加以組織,且較佳地是,相關性係以將該等目標加入標籤、將該等目標加入書籤、或是兩者,或是將一目標之相關性、或數值指示給一搜尋的任何其他使用者行動,而作為基礎。Embodiments of the present invention provide a user-target list (the result list) in response to a search request, wherein the result list is organized based on a relevance of the request to each target, and is preferably The relevance is to add the targets to the tag, bookmark the targets, or both, or to indicate the relevance or value of a target to any other user of the search. basis.

在本發明的一第一觀點之中,一種決定多個目標對於一搜尋要求之相關性的方法係包括,記錄指向該等多個目標的“書籤、及/或”使多個標籤與該等多個目標產生關聯,以及決定該等多個要求之每一個與任何所給予之要求的一相關性。該方法乃是用以於稍後組織目標,以在為了回應一搜尋要求所回覆之一結果列表的範圍內顯示,而目標則是包括,指向網頁、文字、影像、照片、標籤、標籤群組、主題區域、概念、使用者設定檔(user profiles)、回答、音訊檔案、視訊檔案、軟體、或是這些的任何組合,的超連結、或超連結群組。In a first aspect of the present invention, a method of determining a relevance of a plurality of targets to a search request includes recording "bookmarks, and/or" directed to the plurality of targets such that the plurality of tags and the plurality of tags A plurality of goals are associated, and a correlation is determined between each of the plurality of requirements and any given requirements. The method is used to organize the target later to display within a range of results in response to a search request, and the target includes, pointing to a web page, text, image, photo, label, tag group , hyperlinks, or hyperlinks to a topic area, concept, user profiles, answers, audio files, video files, software, or any combination of these.

來自該等多個標籤的每一個標籤都會包含一、或多個項目,並且,該方法更包括,使來自該一、或多個項目的每一個項目與一目標產生關聯,以藉此定義一、或多個相對應的項目-目標對,以及為每一個項目-目標對決定一項目分數,進而指示該項目以及該目標之間的一相關性程度。而二者擇一地,或額外地,該方法亦包括將該目標加入書籤。Each of the plurality of tags from the plurality of tags will contain one or more items, and the method further includes associating each item from the one or more items with a target to thereby define a , or a plurality of corresponding project-target pairs, and a project score for each project-target pair, thereby indicating a degree of correlation between the project and the target. Alternatively, or additionally, the method also includes bookmarking the target.

較佳地是,用於一標籤-目標對之一相關性分數的決定的達成乃是藉由為每一個在該標籤之中的項目組合用於該等項目-目標對的該等項目分數,並且,項目分數之組合的達成乃是藉由加總它們、或是藉由利用一權重而加權每一個項目分數以及加總該等已加權的項目分數。Preferably, the achievement of the determination of a relevance score for a tag-target pair is achieved by combining the item scores for the item-target pairs for each item in the tag, Moreover, the combination of item scores is achieved by summing them, or by weighting each item score by using a weight and summing up the weighted item scores.

在一實施例之中,一用於一標籤-目標對的相關性分數乃是決定自,一項目在該已經與該目標產生關聯之標籤中的出現次數,關聯於該目標之標籤的數量,該標籤已經與該等多個目標產生關聯的次數,或是這些的任何組合。另外,對於一標籤-目標對的一相關性分數亦可以決定自,包含該標籤中之一項目的標籤-目標對的數量,包含對該目標之一關聯的標籤-目標對的數量,或是兩者。In one embodiment, a relevance score for a tag-target pair is determined from the number of occurrences of an item in a tag that has been associated with the target, and the number of tags associated with the target, The number of times the tag has been associated with the multiple targets, or any combination of these. In addition, a correlation score for a tag-target pair may also be determined from the number of tag-target pairs containing one of the items in the tag, including the number of tag-target pairs associated with one of the targets, or Both.

在一另一實施例之中,該方法亦會包括,藉由一第一使用者而使一標籤與來自該等多個目標的一目標產生關聯,藉由一第二使用者而執行一包含該標籤中之一、或多個項目的搜尋要求,以該等相關性分數作為基礎而組織在該結果列表中的該等多個目標,以藉此定義一已組織的結果列表,以及將該已組織結果列表回覆至該第二使用者,其中,對於一目標以及該搜尋要求的一相關性分數係對應於出現在該目標中、或相關聯於該目標之該搜尋要求的每一個項目的相關性分數,並且,二者擇一地、或是額外地,一來自該等多個標籤-目標對之標籤-目標對的一相關性分數乃是決定自,該第一使用者已使與該等多個目標之任何產生關聯之標籤的數量,該第一以及該第二使用者已使與標籤產生關聯之目標的數量,或是兩者。In another embodiment, the method may include: causing a tag to be associated with a target from the plurality of targets by a first user, and performing an inclusion by a second user Searching for one or more items in the tag, organizing the plurality of targets in the list of results based on the relevance scores, thereby defining an organized list of results, and The organized result list is replied to the second user, wherein a relevance score for a target and the search request corresponds to each item of the search request that appears in the target or is associated with the target a relevance score, and, alternatively, or additionally, a relevance score from the plurality of tag-target pairs of the tag-target pairs is determined from the first user having The number of any associated tags of the plurality of targets, the number of targets that the first and second users have associated with the tags, or both.

在一另一實施例之中,一標籤-目標對的一相關性分數乃是決定自,選擇自該第一使用者、該第二使者、或是兩者的一信心評估,並且,一信心評估乃是決定自,該所選擇之使用者已使與目標產生關聯之標籤的一評估,該第一以及該第二使用者之加入書籤、產生標籤、或搜尋活動之間的一相似性度量指標(similarity metric),在該第一以及該第二使者之間的一關係度量指標(relationship metric),或是這些個任意組合。In another embodiment, a relevance score for a tag-target pair is determined from a confidence assessment of the first user, the second messenger, or both, and a confidence The evaluation is a measure of the similarity between the bookmarked, generated label, or search activity of the first and second users by the selected user having made an assessment of the label associated with the target. A similarity metric, a relationship metric between the first and the second ambassador, or any combination of these.

對該等多個目標進行組織的達成乃可以,藉由以該等相關性分數作為基礎而排序該等多個目標(例如,最高排序的目標列於第一個),或是藉由利用一圖形元件而標記該等多個目標的至少其中之一。Organizing the plurality of goals may be accomplished by sorting the plurality of targets based on the relevance scores (eg, the highest ranked target is listed in the first), or by utilizing one The graphical element marks at least one of the plurality of targets.

一標籤、一書籤、或是一評估係可以藉由,將該標籤輸入至一呈現給一使用者的欄位之中、評估該標籤、阻擋指向該目標的一連結(藉此標記一“負”關聯)、選擇該標籤、選擇該目標、檢測一書籤、或是利用該標籤而執行對於該目標的一搜尋,而產生關聯。而在一實施例之中,一標籤爬行程式(tag crawler)則是會使該等多個標籤的至少其中之一與該等多個目標的至少其中之一產生關聯。A tag, a bookmark, or an evaluation system can be used to input the tag into a field presented to a user, evaluate the tag, and block a link to the target (by thereby marking a "negative" "Associate", select the tag, select the target, detect a bookmark, or use the tag to perform a search for the target to generate an association. In one embodiment, a tag crawler is to associate at least one of the plurality of tags with at least one of the plurality of targets.

在本發明的一第二觀點之中,一種用於建立一用於回覆已在一結果列表中進行組織之目標的系統的方法係包括,在一標籤資料庫之中儲存已與多個目標產生關聯的多個標籤,以及在一索引資料庫之中儲存該等多個標籤與該等多個目標之間的相關性分數,其中,該等相關性分數乃會被用以在一已組織之結果列表中組織多個文件。In a second aspect of the present invention, a method for establishing a system for replying to a target that has been organized in a result list includes storing a plurality of targets in a tag database Correlating a plurality of tags, and storing a correlation score between the plurality of tags and the plurality of targets in an index database, wherein the relevance scores are used in an organized Organize multiple files in the results list.

多個標籤乃是藉由儲存來自該等多個標籤的項目而加以儲存在一標籤資料庫之中,以及相關性分數乃會指示該等項目與該等目標之間的一相關性,此外,該方法亦包括,在該索引資料庫中儲存多個索引,且每一個索引輸入乃會對應於來自該等多個項目的一項目,來自多個目標的一相對應目標,以及在該項目以及該目標之間的一相對應相關性分數。A plurality of tags are stored in a tag database by storing items from the plurality of tags, and the relevance scores indicate a correlation between the items and the targets, and The method also includes storing a plurality of indexes in the index database, and each index input corresponds to an item from the plurality of items, a corresponding target from the plurality of targets, and in the item and A corresponding relevance score between the goals.

在一實施例之中,在一項目以及一目標之間的每一個相對應相關性分數乃會相關於將該項目與該目標產生關聯之一使用者的一信心,而二者擇一地,或額外地,對於一項目以及一目標的每一個相關性分數乃是決定自,該目標已經被加入書籤之次數、或是該目標已被提供之評估的數量以及數值,並且,而二者擇一地,或額外地,在一項目以及一目標之間的一相關性分數乃是決定自,一統計分類法、或是排序回歸演算法(rank regression algorithm),例如,邏輯回歸,支持向量機器(support vector machines)、分類或回歸樹(regression tree),或是提昇樹整體(boosted tree ensembles)。In one embodiment, each of the corresponding relevance scores between a project and a goal is related to a confidence of the user who associates the project with the goal, and alternatively, Or additionally, each relevance score for a project and a goal is determined by the number of times the goal has been bookmarked, or the number and value of the assessment that the goal has been provided, and One, or additionally, a relevance score between a project and a goal is determined from a statistical taxonomy, or a ranking regression algorithm, for example, logistic regression, support vector machines (support vector machines), classification or regression trees, or boosted tree ensembles.

該方法亦包括,將一結果列表呈現給一使用者,以回應一包含一項目的搜尋要求,藉由該使用者而使該項目與一被包含在該結果列表之中的目標產生關聯,以及決定該項目與該目標之間的一相關性分數,其中,來自該等多個目標的一目標以及與其相關聯之標籤之間的一相關性分數乃是決定自,該標籤已與該目標產生關聯的次數,已與該目標產生關聯之標籤的總數量,該標籤已與該等多個目標之其中任何產生關聯的次數,已與所有該等多個目標產生關聯之標籤的數量,該標籤已與該目標產生關聯的時間,或是這些的任何組合。The method also includes presenting a list of results to a user in response to a search request including a target by which the item is associated with a target included in the list of results, and Determining a relevance score between the item and the goal, wherein a relevance score between a goal from the plurality of goals and a tag associated therewith is determined from the tag has been generated with the goal The number of associations, the total number of labels that have been associated with the target, the number of times the label has been associated with any of the multiple targets, and the number of labels that have been associated with all of the multiple targets, the label The time that has been associated with the goal, or any combination of these.

在本發明的一第三觀點之中,一種用於組織多個目標以在一結果列表中進行顯示的方法係包括,使在一搜尋要求中的項目與相關聯於多個目標的標籤產生關聯,以及回覆一已包含以該等關聯性作為基礎而進行組織之該等多個目標的結果列表。In a third aspect of the present invention, a method for organizing a plurality of objects for display in a result list includes causing an item in a search request to be associated with a tag associated with a plurality of targets And replying to a list of results that have been included in the plurality of goals organized on the basis of the associations.

在本發明的一第四觀點之中,一種為了回應一搜尋要求而回覆一搜尋結果列表的系統係包括,一標籤資料庫,用於儲存相關聯於目標的標籤,以及一標籤分析器,耦接至該標竿資料庫,其中,該等目標係包括指向網頁、超連結群組、文字、影像、照片、標籤、標籤群組、主題區域、概念、使用者設定檔(user profiles)、回答、音訊檔案、視訊檔案、軟體、或是這些的任何組合,的超連結,且較佳地是,該等目標係為指向網頁的超連結。In a fourth aspect of the present invention, a system for replying to a search result list in response to a search request includes a tag database for storing tags associated with the target, and a tag analyzer coupled Access to the target database, wherein the targets include pointing to web pages, hyperlink groups, text, images, photos, tags, tag groups, subject areas, concepts, user profiles, answers , hyperlinks to audio files, video files, software, or any combination of these, and preferably, such targets are hyperlinks to web pages.

該標籤分析器乃會加以編程,以決定一標籤與一目標之間的一相關性分數,且在一實施例之中,該系統亦會包括一目標索引,以用於儲存標籤與目標之間的相關性分數。The tag analyzer is programmed to determine a relevance score between a tag and a target, and in an embodiment, the system also includes a target index for storing the tag and the target Correlation score.

在一實施例之中,一相關性分數乃是藉由加總用於形成一標籤之項目以及目標的加權相關性分數而加以決定。在一另一實施例之中,一包含項目的搜尋要求與一目標之間的一相關性分數乃是決定自,包含在該搜尋要求中之項目的標籤的數量,被包含在該搜尋要求中之一標籤其被包括在該標籤資料庫之中的次數,已與該目標產生關聯之標籤的數量,在該標籤中以及在該搜尋要求中之項目相符合的數量,或是這些的任何組合。在一再一實施例之中,一標籤以及一目標的一相關性分數作為基礎的乃是,該標籤在該目標範圍內的一位置,該標籤在該目標範圍內的頻率,該標籤在該目標範圍內的一密度,或是這些的任何組合。In one embodiment, a relevance score is determined by summing the items used to form a label and the weighted relevance score of the target. In another embodiment, a relevance score between a search request including a project and a goal is determined by the number of tags included in the search request, and is included in the search request. The number of times a tag is included in the tag database, the number of tags that have been associated with the target, the number of items in the tag and the items in the search request, or any combination of these . In still another embodiment, a tag and a relevance score of a target are based on a position of the tag within the target range, a frequency of the tag within the target range, and the tag is at the target A density within the range, or any combination of these.

在一實施例之中,對於一標籤以及一目標間之一相關性分數的加權乃是基於,被分配至一使該標籤與該目標產生關聯之使用者的信心程度(信心評估),並且,該相關性分數的決定乃是利用一統計分類、或排序回歸演算法,一叢集分析法演算法(cluStering analysis algorithm)、或是一型態分析法演算法(morphological analysis algorithm)而加以決定,其中,該統計分類、或排序回歸演算法則是會包括邏輯回歸,支持向量機器(support vector machines)、分類樹(classification tree)、或分類樹整體(classification tree ensemble)。In an embodiment, the weighting of a relevance score for a tag and a target is based on a degree of confidence (confidence assessment) assigned to a user who associates the tag with the target, and, The determination of the relevance score is determined by using a statistical classification, or a sorting regression algorithm, a clustering analysis algorithm (cluStering analysis algorithm), or a morphological analysis algorithm. The statistical classification, or sorting regression algorithm, may include logistic regression, support vector machines, classification trees, or classification tree ensembles.

在一另一實施例之中,該系統亦會包括一搜尋引擎,耦接至該目標索引,其中,該搜尋引擎係加以編程,以接收包含對應於標籤之項目的搜尋要求,以及以標籤-目標對的相關性分數作為基礎而回覆一已組織的結果列表另外,該系統亦會包括一使用者資料庫,耦接至該搜尋引擎,且該使用者資料庫乃會包括相關於搜尋要求的資訊,例如,一使用者所追蹤的連結,相關聯於目標之標籤,一使用者所阻擋之目標,書籤,或是這些之任何組合。In another embodiment, the system also includes a search engine coupled to the target index, wherein the search engine is programmed to receive search requests including items corresponding to the tags, and to tag- The target pair's relevance score is used as a basis for replying to an organized list of results. In addition, the system will also include a user database coupled to the search engine, and the user database will include relevant search requirements. Information, for example, a link tracked by a user, a tag associated with the target, a target blocked by the user, a bookmark, or any combination of these.

本發明的實施例,不像傳統的搜尋引擎,乃是利用標籤、及/或書籤來提供搜尋該網際網路之使用者更多相關的資訊。在一個實施例之中,一搜尋引擎乃會執行已建立的方法,以接收一要求以及決定相關文件、或文件群組的一列表,分析關聯於文件、或文件群組的標籤,以決定一相關於文件、或文件群組的列表,以及利用某些方式組合該兩個列表,以遞送給使用者一結果列表。在此,可理解的是,當接下來的實例敘述有關搜尋以及回復文件時,本發明事實上是可以用於搜尋以及回復任何的目標,包括,但不限於,對於網頁、文字、影像、照片、標籤、標籤群組、主題區域、主旨、圖表、回覆、音訊檔案、視訊檔案、軟體、或任何上述組合,的超連結、或超連結群組,以上係用以列舉少數的目標。Embodiments of the present invention, unlike conventional search engines, utilize tags, and/or bookmarks to provide more relevant information to users searching the Internet. In one embodiment, a search engine executes an established method to receive a request and determine a list of related files, or groups of files, to analyze tags associated with the file, or group of files, to determine a A list of files, or groups of files, and a combination of the two lists in some manner to deliver to the user a list of results. Here, it can be understood that when the following examples describe the search and reply files, the present invention can be used to search and reply to any target, including, but not limited to, for web pages, text, images, photos. , hyperlinks, hyperlinks, audio files, video files, software, or any combination of the above, hyperlinks, or hyperlinks, are used to list a few targets.

舉例而言,依照本發明,一具有項目X的要求將會獲得一網頁結果列表,包括在該列表某處中的一網頁M,則,一第一使用者乃會利用一包含該項目X的標籤而與該網頁M產生關聯,至於一在一要求中利用該項目X而執行一搜尋的第二使用者則是會被遞送一結果列表,且該結果列表乃會將該網頁M顯示在比其於該第一使用者創造該標籤之前所已經被顯示的一更高位置。For example, in accordance with the present invention, a request with item X would result in a list of web page results, including a web page M in a certain portion of the list, then a first user would utilize a project containing the item X. The tag is associated with the web page M, and a second user who performs a search using the item X in a request is delivered a result list, and the result list displays the web page M in a ratio It is a higher position that has been displayed before the first user created the tag.

依照本發明的實施例,決定該網頁M被認為更相關於該第二使用者的程度,乃是藉由分析因子,包括,但不限於,該項目X被使用於一用於網頁M之標籤之中的次數,關聯於網頁M之所有標籤的總數,該第一使用者已經創造之標籤的數量,每一個使用者已經加入書籤或標籤之文件的數量,該項目X從頭到尾被使用作為一標籤的頻率,從頭到尾之標籤的總數,包含該項目X之標籤/文件對(pairs)的數量,對於該網頁M、該第一與第二人之間的關係、或兩者的一參考,第一以及第二人、或該第一人或該第二人所屬之群組的信心程度。另外,同樣的,在使用者以及它們所屬之群組之間的關係、或程度,以及使用者的信心程度也都可以利用度量指標(metrics)而進行定量,舉例而言,兩個使用者之間的度量指標表示為1時,就代表他們之間會比具有關係度量指標0.5之兩個使用者更為相似(例如,具有相似的興趣、或者會分享共同的朋友)。此外,若是在該項目X之中具有多於一個單字的情形時,其他的因子就可以進行分析,例如,該第二使用者之要求中所包含之該項目X的單字數量,該等單字是否被使用作為一片語,單字的順序,以及先前提到的所有因子,而且,該分析也是可以包括對於單字之不同組合的分析。In accordance with an embodiment of the present invention, determining the extent to which the web page M is considered to be more relevant to the second user is by analyzing factors including, but not limited to, the item X being used in a tag for the web page M. The number of times, the total number of all the tags associated with the web page M, the number of tags that the first user has created, the number of files each user has added to the bookmark or tag, and the item X is used from start to finish. The frequency of a tag, the total number of tags from beginning to end, including the number of tags/file pairs of the item X, for the page M, the relationship between the first and second person, or both Reference, the degree of confidence of the first and second person, or the group to which the first person or the second person belongs. In addition, similarly, the relationship, or degree, between the users and the groups to which they belong, as well as the degree of confidence of the users, can also be quantified using metrics, for example, two users. When the metric is expressed as 1, it means that they will be more similar than the two users with the relationship metric of 0.5 (for example, have similar interests, or share a common friend). In addition, if there is more than one word in the item X, other factors can be analyzed, for example, the number of words of the item X included in the request of the second user, and whether the words are It is used as a phrase, the order of the words, and all the factors mentioned previously, and the analysis can also include analysis of different combinations of words.

依照本發明的其他實施例,該搜尋引擎可能無法遞送進行不同排序的結果列表,而是會設置一些圖形元件來指示哪些文件已經被包含在內,因為它們具有相關於其的標籤。In accordance with other embodiments of the present invention, the search engine may be unable to deliver a list of results for different rankings, but rather some graphical elements may be set to indicate which files have been included because they have tags associated with them.

依照本發明的其他實施例,標籤可能無法明確地藉由使用者而與文件產生關聯,而會是藉由,檢驗書籤,引導搜尋,或是其他使用者的行為,例如,評價(rating),阻擋(blocking),保存,點選。According to other embodiments of the present invention, the tag may not be explicitly associated with the file by the user, but may be by checking the bookmark, guiding the search, or other user's behavior, such as rating, Block, save, click.

依照本發明的其他實施例,標籤係可以相關於不僅網頁、或網頁群組,也可以相關於任何可辨識的資料來源,公開或非公開的,包括,但不限於,影像、照片、其他標籤、標籤群組、主題區域、使用者設定檔(user profile)、概念、地圖、音訊或視訊檔案、軟體、或其他目標。In accordance with other embodiments of the present invention, the tags may be related to not only web pages, or groups of web pages, but may also be associated with any identifiable source of information, public or non-public, including, but not limited to, images, photos, other tags. , tag group, subject area, user profile, concept, map, audio or video file, software, or other target.

在接下來的全部敘述之中,該名詞“搜尋引擎”乃是被用於表示一裝置(或是在一般目的之電腦上運作的程式),其係用以輸入一要求,並會產生對於電子文件、或網頁、或其他可透過網路而加以存取之目標的超連結的一結果列表,而且,該搜尋引擎乃會包括,在其文集之文件的索引,決定每一個文件之相關性的編碼以及演算法,以及將該結果列表遞送給該使用者的圖形使用者介面。In the following full description, the term "search engine" is used to mean a device (or a program that runs on a general purpose computer) that is used to input a request and generate an electronic a list of results for a hyperlink to a document, or web page, or other target that is accessible over the network, and the search engine will include an index of the files in its collection to determine the relevance of each file. Encoding and algorithms, and a graphical user interface that delivers the list of results to the user.

在接下來的全部敘述之中,該名詞“要求(query)”乃是表示一組提交至該搜尋引擎的項目,無論是鍵入、語音輸入、透過一已經嵌入一組搜尋項目的“連結”而加以提交、或是藉由任何其他介面而加以提交。另外,一要求係可以包括一單一單字,多個單字,或是片語,該要求係可以,用言語表達為一問題(例如,一“自然語言(natural language)要求”),一不嚴謹的項目組合,或是一加以建構的布林表示法。甚至,一要求係可以包括一搜尋引擎用以搜尋包含或相關於搜尋字元的電子文件、或是網頁的符號、或是任何其他字元。In the following full narrative, the term "query" refers to a set of items submitted to the search engine, whether typing, voice input, or through a "link" that has been embedded in a set of search items. Submit it or submit it through any other interface. In addition, a requirement system may include a single word, a plurality of words, or a phrase, and the requirement may be expressed in words as a question (for example, a "natural language requirement"), an imprecise A portfolio of projects, or a constructed Boolean representation. Even a request system can include a search engine for searching for electronic files containing or related to search characters, or for symbols of web pages, or any other character.

在接下來的全部敘述之中,該名詞“網站”乃是用以表示,連結在一起、並可得於全球資訊網(World Wide Web)之中的一網頁集合,而該名詞“網頁”則是用以表示,刊登於一網站上、並可透過全球資訊網而自任何數量的主機進行存取的文件,其係包括,但不限於,文字、視訊、影像、音樂、以及圖形。In the following full description, the term "website" is used to mean a collection of web pages that are linked together and available in the World Wide Web, and the term "webpage" is used. It is a file that is displayed on a website and accessible from any number of hosts through the World Wide Web, including, but not limited to, text, video, video, music, and graphics.

在接下來的全部敘述之中,該名詞“結果列表(results list)”係表示一超連結、或是超連結群組的列表,可以利用超文件傳輸通訊協定(Hypertext Transfer Protocol,HTTP)、或是任何其他用於存取網頁或其他電子文件之通訊協定而進行存取的參考文件、目標(正如上述所定義,包括,但不限於,影像以及視訊)、或是網頁,以及其他用於每一個連結的相關資訊,包括,但不限於,該等文件之標題,該等文件的摘要,相關聯標籤或其他相關度量指標的數量,相關聯標籤的列表,對於該等文件之緩衝儲存副本的連結,該等文件最後編入索引、或最新進行修正的日期,相關聯於、或位於該等文件的影像,自該等文件所擷取的資訊,以及可以將其加入書籤、或標籤的使用者。In the following full description, the term "results list" means a hyperlink or a list of hyperlinked groups, which can utilize Hypertext Transfer Protocol (HTTP), or Is a reference file, target (as defined above, including but not limited to, video and video), or any other web page used for accessing any other communication protocol used to access web pages or other electronic files. Information about a link, including, but not limited to, the title of the documents, a summary of the documents, the number of associated tags or other related metrics, a list of associated tags, and a buffered copy of the files. The link, the date on which the documents were last indexed, or the date of the latest revision, the images associated with, or located in, the information retrieved from those documents, and the users who can bookmark or tag them. .

在接下來的全部敘述之中,該名詞‘標籤”係用以表示任何包含下列其中任一的資料結構:一或多個項目,構成一或多個單字的每一個,參照一可尋址目標的超連結,以及其他資訊,例如,該標籤被產生的時間以及產生它的使用者。一標籤係可以包含指向多個目標的一連結,舉例而言,一網頁、一影像、一地圖、或是其他在一電腦網路上的目標,無論是在網際網路上、或是在一區域電腦儲存裝置之上。另外,“標籤加入(tagging)”亦代表將一項目與一指向一可尋址文件、或目標的特殊超連結產生關聯的程序。In the following full description, the term 'tag' is used to mean any data structure containing any one of: one or more items, each of one or more words, with reference to an addressable target Hyperlink, and other information, such as when the tag was generated and the user who generated it. A tag can contain a link to multiple targets, for example, a web page, an image, a map, or It's the other target on a computer network, whether it's on the Internet or on a regional computer storage device. In addition, "tagging" also means pointing an item and a point to an addressable file. , or a special hyperlink to the target that produces the associated program.

正如在此所使用的一樣,該項目“書籤”係代表,記錄任何超連結的任何資料結構,創造該書籤之該使用者的一指示,該書籤被創造的時間,以及如上述所定義的一標籤。As used herein, the item "bookmark" is used to record any data structure of any hyperlink, an indication of the user who created the bookmark, the time the bookmark was created, and a time as defined above. label.

正如在此所使用的一樣,該名詞“文件(document)”乃是被賦予廣義的定義,並且,除了其原有的意義之外,更包括電腦檔案以及網頁。該名詞“文件”並不受限於包含文字的電腦檔案,亦可包含使用者檔案、概念、答案、包含圖形之電腦檔案、音訊、視訊、以及其他多媒體資料。使用者檔案係為頁面或記錄,包含,但不限於,相關於一個人的資訊,例如,其興趣、習慣、朋友列表、照面、職業經歷、以及教育,此些乃是用以舉例的一些資訊。As used herein, the term "document" is defined broadly and, in addition to its original meaning, includes computer files and web pages. The term "file" is not restricted to computer files containing text, but may also include user files, concepts, answers, computer files containing graphics, audio, video, and other multimedia materials. A user profile is a page or record that contains, but is not limited to, information about a person, such as their interests, habits, friends list, face, career experience, and education, which are some of the information used for example.

正如在此所使用的一樣,該名詞“濫發垃圾郵件者(spammer)”係加以定義為,利用設計來擷取該搜尋引擎之相關方法論的優勢的任何數量技術,而試圖使一搜尋引擎以比起該搜尋引擎用別的方法進行顯示更高的順位或更多的頻率來顯示關聯於其產品、網頁、或其他材質之連結的人、或實體。As used herein, the term "spammer" is defined as any number of techniques that are designed to capture the advantages of the relevant methodologies of the search engine, while attempting to make a search engine A person or entity that displays a higher rank or more frequency associated with a link to its product, web page, or other material than the search engine uses a different method.

正如在此所使用的一樣,該名詞“編程”係表示,用以對資料進行儲存、處理、傳輸、或其他操作的硬體、軟體、韌體、或其他被用以執行電腦指示之工具的任何組合。As used herein, the term "programming" means a hardware, software, firmware, or other tool used to perform computer instructions for storing, processing, transmitting, or otherwise manipulating data. Any combination.

正如接下來會有的更詳細敘述,一搜尋引擎會接受一使用者所輸入的要求,並會利用具有辨識最有可能相關於該等使用者所尋找之資訊的該些文件的目的之各種相關的計算,以進行該等搜尋詞會與一文件索引之間的比對,接著,該搜尋引擎會產生指向該些文件之超連結的一排序列表,其中,被認為最相關的文件最較為接近該列表的上方。依照本發明,使用者係具有產生讓項目關聯於文件之標籤的能力,並且,一搜尋引擎乃至少會部分地以與網頁相關聯之該等標籤的分析作為基礎而回覆一結果列表。As will be described in greater detail below, a search engine will accept the requirements entered by a user and will utilize various correlations with the purpose of identifying those documents most likely to be relevant to the information sought by such users. a calculation to compare the search terms with a file index, and then the search engine generates a sorted list of hyperlinks to the files, wherein the most relevant files are considered to be the closest Above the list. In accordance with the present invention, a user has the ability to generate a tag that associates an item with a file, and a search engine replies to a list of results based, at least in part, on the analysis of the tags associated with the web page.

依照本發明,一標籤影響一文件對於一所給要求之相關性的程度,係可能會相關於創造該關聯之使用者的信心程度,而此信心程度則是可以藉由一些因子而加以決定,例如,該使用者之標籤在過去的相關程度如何,該等使用者所觀察到的活動與其他愛好的使用者、或執行該要求之使用者的相似度為何,在使用者間之連接程度,以及其它的因子。In accordance with the present invention, the extent to which a tag affects the relevance of a document to a given requirement may be related to the degree of confidence of the user who created the association, and the degree of confidence may be determined by a number of factors. For example, how relevant is the user's tag in the past, and how similar the activities observed by the user are to other users of the hobby, or to the user performing the request, and the degree of connection between the users, And other factors.

第1圖係為依照本發明,用於顯示回應一要求所獲得之一結果頁面100的一圖形使用者介面(graphical user interface,GUI)的螢幕快照(screen shot)。該結果列表則是可以以對於相關聯於每一個連結之標籤的分析作為基礎而進行重新排序、或是進行標記。1 is a screen shot of a graphical user interface (GUI) for displaying a result page 100 obtained in response to a request in accordance with the present invention. The list of results can be reordered or flagged based on an analysis of the tags associated with each link.

該結果頁面100係包括一用以輸入一要求項目的空格110,以及一包含該搜尋引擎所回覆之該結果列表的區域120,而該區域120則是亦可以包含相關聯於該搜尋引擎所回復之每一個結果的標籤列表150。正如在接下來更為詳細的敘述,在一較佳實施例之中,於該區域120之該等結果的其中一些、或全部乃是已經以對於該等標籤150的分析作為基礎而進行重新排序,或者,在一另一實施例之中,於該區域120之中的該等結果也是可進行重新排序,但是其中一些是利用圖形元件140來進行標記,以指示對於該等標籤、及/或書籤150的分析將會對他們的相關性產生影響,此外,該結果頁面100亦會包括用於評估目標的機制165。The result page 100 includes a space 110 for inputting a required item, and an area 120 including the result list replied by the search engine, and the area 120 may also include an replies associated with the search engine. A list of tags 150 for each result. As will be described in greater detail below, in a preferred embodiment, some or all of the results of the region 120 have been reordered based on the analysis of the tags 150. Or, in another embodiment, the results in the region 120 are also reorderable, but some of them are labeled with the graphical component 140 to indicate for the tags, and/or The analysis of bookmarks 150 will have an impact on their relevance, and in addition, the results page 100 will include a mechanism 165 for evaluating the goals.

使用者係可以在文件以及他們認為可描述該等文件的項目之間建立一相關性,而此程序,正如先前所敘述地,則是被稱之為“加入書籤(bookmarking)”、或“產生標籤(tagging)”,此程序的完成乃是藉由,在加入書籤的例子中,點擊在區域120中的一超連結、或是圖案元件160,以啟動一記錄該連結以用於稍後回復的機制,或者,利用在書籤工具列中的一延伸程式、或工具列、或小應用程式(applet),或者,在產生標籤的例子中,產生一新標籤、或多個新標籤與一文件的關聯,其中,此元件160係可以是一文字連結,一影像,例如,一盤狀物,或是任何其他會建議將該文件“加入書籤”或“產生標籤”的圖像。而既然不同的使用者就會有何種項目相關聯於不同文件的不同想法,因此,將可以建立種類及變化眾多的一標籤集合。而此標籤集合就是依照本發明所進行分析者。The user can establish a correlation between the files and the items they think can describe the files, and the program, as previously described, is referred to as "booking" or "generating" Tagging, the completion of the procedure is by clicking on a hyperlink in area 120 or pattern element 160 in the example of adding a bookmark to initiate a record for later reply. Mechanism, either by using an extension program in the bookmark toolbar, or a toolbar, or an applet, or, in the example of generating a label, generating a new label, or multiple new labels and a file The association, wherein the component 160 can be a text link, an image, for example, a disc, or any other image that would suggest "booking" or "generating a label" of the file. Since different users will have different ideas about which projects are associated with different files, it will be possible to create a collection of tags with a wide variety and variety. This set of tags is the analyst in accordance with the present invention.

使用者係可以阻止來自一結果列表之對於一文件的連結,若是他們並不認為該文件是相關於該要求時,而此程序則是被稱之為“阻擋(blocking)”,且此程序乃是藉由點擊在區域120中的一超連結、或圖形元件170,進而啟動一阻擋對於一文件之連結的機制所加以完成,其中,該元件170係可以是一文字連結,一影像(例如第1圖,170),或是任何其他會建議“阻擋”或“移除”該文件的圖像。阻擋該文件將會具有相關聯於用於該包含有於該要求中之該項目的文件的一負標籤(negative tag)的結果,而既然不同的使用者就會有何種項目相關聯於不同文件的不同想法,則有時候自然會有不同意一標籤對一文件而言是否為適合的反對意見,不過,其他時候,垃圾郵件發送者(spammers)將會蓄意地使錯誤導向的標籤與文件產生關聯。而正如接下來更為詳細的敘述,正及負的標籤係可以用來補償不同的意見,並且,降低垃圾郵件、或是其他錯誤導向文件的量。The user can block the link to a file from a list of results. If they do not consider the file to be related to the request, the program is called "blocking" and the program is By clicking on a hyperlink in the area 120, or the graphic element 170, a mechanism for blocking the connection to a file is initiated, wherein the element 170 can be a text link, an image (eg, the first Figure, 170), or any other image that would suggest "blocking" or "removing" the file. Blocking the file will have a result associated with a negative tag for the file containing the item in the request, and since different users will have different items associated with each other. Different ideas of documents sometimes have different opinions about whether a label is suitable for a document, but at other times, spammers will deliberately make error-oriented labels and documents. Generate an association. And as explained in more detail below, positive and negative labels can be used to compensate for different opinions and to reduce the amount of spam or other misdirected files.

該結果頁面100係一可以包括一區域180,以用於顯示一對於一相關於該要求項目之一概念的敘述,以及一區域190,包含對於相關於其他要求項目之其他概念的“亦參見(see also)”連結。The results page 100 can include an area 180 for displaying a narrative relating to a concept related to the required item, and an area 190 containing "for other concepts related to other required items" (see also See also)" link.

正如在第1圖的該實施例中所顯示的,當一使用者在該空格110中輸入該要求項目“U2”,並要求一搜尋時,則回復給他的就是該結果頁面100,其中,該區域120會包含連接至相關於該要求項目“U2”之目標的連結的結果的一列表。在一較佳實施例之中,該等結果130乃是已經以對於使用者已經相關聯於各種文件之該等標籤150所進行的分析作為基礎而進行重新排序,舉例而言,位在www.atu2.com 之標題為“U2首頁:@U2...”的文件係已經與項目“U2”、“U2粉絲站”、以及“U2粉絲”產生標籤相關,正如在標籤列表150中所顯示,而該有關標籤的分析則是已經造成此文件會被列在該結果列表的較高位置,比起其他的方法而言。在一另一實施例之中,結果係可以進行重新排序以及標記,但是,其中一些結果將會利用一圖形元件140而進行標記,以顯示使用者標籤、及/或書籤的分析係指示出了人們發現該些結果會有較多的相關 性,以及,可選擇地是,發現它們之相關性的人數,舉例而言,位在www.u2station.com 之標題為“U2站台”的文件係已經與項目“U2”以及“U2粉絲站”產生標籤相關,並且,係已經標記以一人形140的圖形元件,以指示其他的使用者已經發現其相關性。在此,可以理解的是,除了一人形的記號之外的圖形元件也是可以被用來通知使用者該相關性已經受到其他使用者之指示的事實。As shown in the embodiment of FIG. 1, when a user inputs the required item "U2" in the space 110 and requests a search, the result page 100 is returned to him, wherein This area 120 will contain a list of results linked to the links associated with the target of the required item "U2". In a preferred embodiment, the results 130 are reordered based on an analysis of the tags 150 that the user has associated with the various files, for example, at www. The file titled "U2 Home: @U2..." by atu2.com has been associated with the items "U2", "U2 Fan Station", and "U2 Fans", as shown in the Tag List 150. The analysis of the label has caused the file to be listed in the higher position of the result list, compared to other methods. In another embodiment, the results may be reordered and labeled, but some of the results will be marked with a graphical element 140 to indicate that the user tag, and/or the bookmark analysis indicates It has been found that the results are more relevant and, optionally, the number of people who find their relevance. For example, the file titled "U2 Platform" at www.u2station.com is already The tag is associated with the item "U2" and the "U2 fan station", and the graphical element has been marked with a humanoid 140 to indicate that other users have found their relevance. Here, it can be understood that a graphic element other than a human figure is also used to inform the user that the relevance has been instructed by other users.

若是使用者希望將一文件加入書籤、及/或產生一標籤時,舉例而言,www.u2log.com ,該使用者係可以選擇點擊該圖形記號160,而該圖形記號則是可以啟動一用於加入書籤、及/或增加一標籤的機制,且該被增加的標籤係可以相同於已經存在的該等標籤,在區域110之中的該搜尋項目,或是一些對該使用者有意義的其他項目。若是該使用者已經完成的一不一樣的搜尋,舉例而言,“歌詞”,而且,該使用者並不相信標題為“U2 Wanderer.org U2唱片集與U2歌詞站”的文件應該被列在該要求“歌詞”的結果列表之中時,則該使用者係可以藉由點擊該元件170而阻擋該具有使該包含該項目“-歌詞”的標籤會與該文件產生關聯之結果的搜尋結果中所得出的該文件,其中,該負號(“-”)係表示該項目與該網頁之間的一反向的(dissenting),而非確認的(affirming),關聯性。If the user wishes to bookmark a file and/or generate a label, for example, www.u2log.com , the user can choose to click on the graphic symbol 160, and the graphic symbol can be activated. a mechanism for adding a bookmark, and/or adding a label, and the added label can be the same as the existing label, the search item in the area 110, or some other meaning to the user. project. If the user has completed a different search, for example, "Lyrics", and the user does not believe that the file titled "U2 Wanderer.org U2 Album and U2 Lyrics Station" should be listed. When the result list of the "Lyrics" is requested, the user can block the search result having the result that the label containing the item "-lyrics" will be associated with the file by clicking on the element 170. The file obtained in the text, wherein the minus sign ("-") indicates a dissenting, not affirming, association between the item and the web page.

繼續該實例,該區域180係包含一敘述該樂團“U2”的概念,以及一相關於該項目“U2”之其他概念的列表,該區域190係包括一指向相關主題之“亦參閱”連接集合,舉 例而言,“Bono,U2演唱會,最唱銷音樂藝人,統一行動,現場八方...”,根據本發明,若是該使用者選擇這些連結的其中之一,舉例而言,“現場八方”,時,一要求將會被導向利用該搜尋項目“現場八方”而產生類似該結果頁面100的一結果頁面,其中,該搜尋項目110係為“現場八方”,以及該結果列表120係為一指向相關於該搜尋項目之文件的列表,並且,在該列表中的位置乃會依次地受到相關聯於該等文件之標籤的影響,因此,來自任何搜尋之結果的順序皆會受到該標籤的影響。主題區域以及概念則是在2006年2月27日提出申請,標題為“Methods of and Systems for Searching by Incorporating User-Entered Information”,申請案號為第11/364,617號的美國專利申請案之中有所敘述,此係亦於本案之中併入做為參考。Continuing with the example, the region 180 includes a concept describing the orchestra "U2" and a list of other concepts related to the project "U2", the region 190 including a "see also" connection set pointing to the related topic. Lift For example, "Bono, U2 concert, most sing-along artist, unified action, on-site...", according to the present invention, if the user chooses one of these links, for example, "the scene At the time, a request will be directed to use the search item "sites" to generate a result page similar to the result page 100, wherein the search item 110 is "site eight" and the result list 120 is a list of files associated with the search item, and the locations in the list are sequentially affected by the tags associated with the files, so the order from any search results is subject to the tag Impact. The subject area and concept are filed on February 27, 2006, entitled "Methods of and Systems for Searching by Incorporating User-Entered Information", in US Patent Application No. 11/364,617 As described, this system is also incorporated by reference in this case.

可以理解的是,依照本發明,係可以有許多的修飾。舉例而言,由使用者所產生的標籤,除了藉由使用者直接從一終端輸入之外,也可以讀取自一檔案、或是自另外的服務引進,再者,當該結果頁面100顯示結果列表120以及標籤列表150,以及概念180,以及指向概念190的連結時,其係可以理解的是,依照本發明,受到標籤分析影響的結果頁面乃可以利用區域,包括在第1圖中所顯示的那些、或是除了該些之外者,或是在沒有這些區域的其中一些,的任何組合而加以顯示,並且,該標籤資訊乃是加以使用來與各式的頁面設計元件相互組合,以產生更為廣泛、正確、以及更具意義的搜尋結果。It will be appreciated that many modifications are possible in accordance with the invention. For example, the label generated by the user can be read from a file or imported from another service, in addition to being directly input from a terminal by the user, and when the result page 100 is displayed, When the result list 120 and the tag list 150, as well as the concept 180, and the link to the concept 190, it will be understood that, in accordance with the present invention, the resulting page affected by the tag analysis can utilize the area, including in Figure 1. The displayed ones, or those other than those, or any combination of some of these areas are displayed, and the label information is used to combine with various page design elements. To produce a broader, correct, and more meaningful search result.

第2圖係為依照本發明的一流程圖,以用於舉例說明一網際網路搜尋應用程式200的操作。該網際網路搜尋應用程式200係提供使用者將要求呈送至該搜尋引擎、以及接收至少部分是藉由一標籤分析而加以決定之結果的能力,以藉此提供使用者比其他方法所提供者更多的相關搜尋結果,使用者係可以參觀在該結果列表中所顯示的該等網頁,並且,他們也可以選擇將該些頁面的其中一些“加入書籤”,以表示他們是發現該等頁面的相關、還是不相關,再者,他們可是可以利用該要求項目、或是利用一些其他的項目、或多個項目而產生該些頁面的標籤,至於該搜尋引擎則是會記錄任何已呈送的標籤,且會於其他使用者所執行的未來搜尋之中使用該等標籤。2 is a flow diagram in accordance with the present invention for illustrating the operation of an internet search application 200. The Internet search application 200 provides the ability for a user to submit a request to the search engine and receive a result determined at least in part by a tag analysis to thereby provide the user with a provider other than the method provided. For more relevant search results, the user can browse the web pages displayed in the results list, and they can also choose to "book" some of the pages to indicate that they are discovering the pages. Relevant, or irrelevant, in addition, they can use the request item, or use some other items, or multiple items to generate the labels of the pages, and the search engine will record any submitted items. Labels and use them in future searches performed by other users.

在步驟210之中,該使用者係呈送一要求至一搜尋引擎,然後,該程序會繼續前進至步驟220以及步驟230兩者,在步驟220之中,該搜尋引擎係利用各式各樣使用各式演算法的資訊恢復方式而比對該要求以及在該文集之中的該目標,以集合出一最相關文件的列表,以及在步驟230之中,該搜尋引擎係分析相關於各式文件的該等標籤,以集合出一最為相關文件的列表,其中,該標籤的分析係可以是對所有使用者皆為一樣、或者也可以是為實施該搜尋的個別使用者所量身定作,或是為該使用者為其中之一會員的一群組而加以量身定作,之後,該等步驟220以及230繼續前進至步驟240,而在步驟240之中,得自步驟230的該等結果係與得自步驟220的該等結果相互組合,以提供更為相關的結果,接著,該程序繼續到達該步驟250,且在步驟250之中,該結果頁面(例如,100,第1圖)係加以發送至該使用者,再者,該使用者係可以選擇要自該步驟250前進至步驟260、或270。In step 210, the user submits a request to a search engine, and then the program proceeds to step 220 and step 230. In step 220, the search engine utilizes various uses. The information recovery method of each type of algorithm compares the requirements with the target in the corpus to assemble a list of the most relevant files, and in step 230, the search engine analyzes the various files. Such tags are used to assemble a list of the most relevant documents, wherein the analysis of the tags may be the same for all users, or may be tailored to the individual user performing the search. Or tailored for the group in which the user is one of the members, then steps 220 and 230 continue to step 240, and in step 240, from step 230 The results are combined with the results from step 220 to provide more relevant results, and then the program continues to step 250, and in step 250, the results page (eg, 100, 1st) ) To be sent to the user system. Further, the user can select from the Department of step 250 proceeds to step 260, or 270.

在步驟260之中,該使用者會遵循該等連結的其中之一、或多,以參觀在該結果列表之中的該等文件,或者,二者擇一地,在步驟270之中,該使用者會進行書籤輸入,以及選擇性地,輸入標籤,而其每一個動作則都是會將一、或多個項目與在該結果列表之中之該等文件的其中之一產生關聯,所以,為了輸入一標籤,係可以點擊在一會提供該使用者用以輸入標籤的一使用者介面的機制之上,以將一文件加入書籤、或是標籤(例如,區域160,第1圖),或者,在步驟260之中,該使用者係可以使用載於其瀏覽器的一“書籤程式(bookmarklet)”、或是其他類似的機制,以將該文件加入書籤以及輸入該標籤,或者,二者擇一地是,在步驟270之中,該使用者係可以藉由點擊一用於阻擋的機制而阻擋被視為不相關於該要求的文件。因此,從步驟260,參觀在該結果列表之中的文件,該使用者可以繼續進行至步驟270,輸入標籤,並且,二者擇一地,該使用者可以自步驟270進行至步驟260,而步驟260以及270兩者則都會導通至步驟280,在步驟280之中,該系統係記錄由該使用者所輸入的書籤、標籤、以及評估,接著,步驟280會回到步驟230,則現在,會於任何之接續搜尋期間進行分析的標籤的資料庫就會包括在步驟270中所輸入的該些新的標籤,此外,平行於回到步驟230的循環,該程序亦可以繼續至步驟290,而在步驟290之中,該使用者則是會對其搜尋做出結論。In step 260, the user will follow one or more of the links to view the files in the results list, or, alternatively, in step 270, The user enters a bookmark and, optionally, enters a label, and each of the actions associates one or more items with one of the files in the list of results, so In order to enter a tag, a file can be clicked on a mechanism that provides a user interface for the user to enter the tag to bookmark a file or label (eg, area 160, Figure 1). Or, in step 260, the user can use a "bookmarklet" contained in his browser, or other similar mechanism to bookmark the file and enter the tag, or, Alternatively, in step 270, the user can block files deemed to be unrelated to the request by clicking on a mechanism for blocking. Thus, from step 260, the file in the results list is visited, the user can proceed to step 270, enter the tag, and, alternatively, the user can proceed from step 270 to step 260, and Both steps 260 and 270 are passed to step 280, in which the system records the bookmarks, tags, and evaluations entered by the user, and then step 280 returns to step 230, now, The database of tags that will be analyzed during any subsequent searches will include the new tags entered in step 270, and in addition, parallel to the loop back to step 230, the process may continue to step 290. In step 290, the user will conclude his search.

第3圖係舉例說明依照本發明之一系統300的構件。該系統300係包括一使用者用戶端305,其係耦接至一網路伺服器310,且該網路伺服器310係耦接至一搜尋引擎320,一使用者資料庫330,以及一標籤資料庫340,其中,該搜尋引擎320係耦接至一文件索引350,該使用者資料庫330亦會耦接至該搜尋引擎320,該標籤資料庫則是會耦接至一標籤分析器360,以及一標籤爬行程式(tag crawler)391,並且,該標籤分係器360係亦會耦接至該文件索引350,且該文件索引350會依次地耦接至一索引器(indexer)370,該索引器370會加以耦接至一網路內容資料庫(Web content database)380,其則是會加以耦接至一網路爬行程式(Web crawler)390,另外,該網路爬行程式390以及該標籤爬行程式391係透過網際網路395而加以耦接至一、或多個網站399。Figure 3 is a diagram illustrating the components of a system 300 in accordance with the present invention. The system 300 includes a user client 305 coupled to a network server 310, and the network server 310 is coupled to a search engine 320, a user database 330, and a tag. The database 340 is coupled to a file index 350. The user database 330 is also coupled to the search engine 320. The tag database is coupled to a tag analyzer 360. And a tag crawler 391, and the tag classifier 360 is also coupled to the file index 350, and the file index 350 is coupled to an indexer 370 in turn. The indexer 370 is coupled to a web content database 380, which is coupled to a web crawler 390, and the web crawler 390 and The tag crawler 391 is coupled to one or more websites 399 via the Internet 395.

在操作時,該網路爬行程式390會透過網際網路395而航行(navigate),參觀網站399,並且,利用其所存取之該等網頁的內容而建立該網路內容資料庫380,而且,該索引器370係使用該網路內容資料庫380來產生該文件索引350,至於該標籤爬行程式391則是會透過網際網路395而航行(navigate),參觀網站399,並且,利用其所發現之該等標籤而建立該標籤資料庫340。In operation, the web crawler 390 navigates through the Internet 395, visits the website 399, and creates the web content repository 380 using the content of the web pages it accesses, and The indexer 370 uses the network content database 380 to generate the file index 350. The tag crawler 391 navigates through the Internet 395, visits the website 399, and utilizes its location. The tag database 340 is created by the tags found.

當一使用者引導一搜尋時,其係利用該使用者客戶端305而輸入一要求,並會被呈送至該網路伺服器310,該網路伺服器310則是會將該要求呈送至該搜尋引擎320,然後,該搜尋引擎乃會利用相關性演算法以及衍生自上述之該標籤分析的因子而比對該要求以及該文件索引350,以決定出最為相關的文件,並且,乃會將該結果列表回覆至該網路伺服器310,接著,該網路伺服器310會將該結果頁面(例如,100,第1圖)遞送至該使用者用戶端305,以進行顯示。When a user directs a search, it uses the user client 305 to enter a request and is presented to the web server 310, which then submits the request to the web server 310. Search engine 320, which then uses the correlation algorithm and the factors derived from the tag analysis described above to compare the request with the file index 350 to determine the most relevant file, and will The list of results is replied to the web server 310, and the web server 310 then delivers the resulting page (e.g., 100, Figure 1) to the user client 305 for display.

並且,為了回應該要求,該使用者資料庫330乃會記錄有關該使用者之搜尋的資訊,例如,來自該結果列表的連結(例如,區域120,第1圖),已加入書籤、或已進行評估的文件(例如,步驟165),以及利用該標籤輸入機制(例如,區域160,第1圖)所輸入的標籤與利用具有輸入負標籤之效果的該阻擋機制(例如,區域170,第1圖)所阻擋的文件,此資訊係可以為該網路伺服器310以及該搜尋引擎320所使用,以替該個使用者定作接續的搜尋結果,以及決定要在該個使用者的標籤之中置入多少信心,另外,為了回應一要求,該使用者利用該標籤輸入機制(例如,區域160,第1圖)所輸入的標籤與該使用者利用該阻擋機制(例如,區域170,第1圖)所輸入的負標籤亦會被記錄在該標籤資料庫340之中。在本發明的一實施例之中,儲存在該使用者資料庫330以及該標籤資料庫340之中的資訊係可以加以實施為兩個分開的資料庫、或者它們可以加以實施為在相同的資料庫之中。And, in response to the request, the user database 330 will record information about the user's search, for example, a link from the result list (eg, area 120, Figure 1), bookmarked, or already A file for evaluation (eg, step 165), and a tag entered using the tag input mechanism (eg, region 160, FIG. 1) and the blocking mechanism utilizing the effect of having an input negative tag (eg, region 170, 1) the blocked file, which can be used by the web server 310 and the search engine 320 to determine the subsequent search results for the user and to determine the label to be used in the user. How much confidence is placed therein, and in response to a request, the user enters the tag entered by the tag input mechanism (eg, region 160, FIG. 1) with the user using the blocking mechanism (eg, region 170, The negative label entered in Fig. 1) is also recorded in the label database 340. In an embodiment of the invention, the information stored in the user database 330 and the tag database 340 can be implemented as two separate databases, or they can be implemented as the same data. In the library.

在一些即時的基礎之下,但非為必要,當一要求加以實施時,該被包含在該標籤資料庫340之中的標籤資訊乃會加以發送至該標籤分析器360,在此,其乃會進行分析,以決定哪種對於相關性的影響是要藉由該等各種相關聯於每一個文件的標籤所加以判斷,以替該搜尋引擎320決定對要求而言最為相關的網頁,而該標籤分析器360則是會在該文件索引350之中記錄此標籤相關性資訊,以用於接續的搜尋。Under some immediate basis, but not necessarily, when required to be implemented, the tag information contained in the tag database 340 is sent to the tag analyzer 360, where An analysis is performed to determine which impact on relevance is to be determined by the various tags associated with each of the files to determine the most relevant web page for the request by the search engine 320. Tag analyzer 360 will record this tag relevance information in the file index 350 for subsequent searches.

該標籤資料庫340係發送特徵至該標籤分析器360,包括,但不限於,要求項目,使用者辨識符(user identifier),文件IDs,文件連結,標籤項目,評價等級,以及時間戳印(time stamps),另外,該標籤分析器360也是可以為該所給予的文件尋找其他的特徵,包括,但不限於,該等項目在該文件範圍內的密度,該等項目在該文件範圍內的位置,該等項目在該文件之各個段落內的存在,以及對於包含該等項目之該文件的超連結,此外,該標籤分析器360也是可以為該所給予的文件尋找其他的特徵,包括,但不限於,先前的標籤歷史,書籤歷史,信心程度,與其他標籤的相似度(例如,在所使用之該等搜尋項目以及此使用者與其他使用者所創造之標籤之間的相似度),以及群組中的會員。The tag database 340 sends features to the tag analyzer 360, including, but not limited to, request items, user identifiers, file IDs, file links, tag items, rating levels, and time stamps ( Time stamps), in addition, the tag analyzer 360 can also find other features for the given file, including, but not limited to, the density of the items within the scope of the file, the items within the scope of the file Locations, the existence of such items in various paragraphs of the document, and hyperlinks to the documents containing the items, in addition, the tag analyzer 360 can also find other features for the given file, including, But not limited to, previous tag history, bookmark history, confidence level, similarity to other tags (eg, similarity between the search items used and the tags created by this user and other users) , and members in the group.

該標籤分析器360係使用這些特徵,以根據不同的標籤而發展出用於各種文件的一組相關性分數,其中,藉以分析這些特徵的程序係顯示於第4圖之中,且該機制乃是用以替任何所給予的要求-文件對而計算一使用者程度、或一通用之解決方法(generic solution)的相關性。The tag analyzer 360 uses these features to develop a set of relevance scores for various files based on different tags, wherein the program by which these features are analyzed is shown in Figure 4, and the mechanism is It is used to calculate the degree of a user, or the relevance of a generic solution, for any given request-file pair.

一分析係加以執行在先前所論及的該標籤資料於之上。一般而言,對任何所給予之要求而言,任何提供之文件的相關性將會是因子的一函數,包括,但不限於,包含該要求中之項目的標籤的數量,任何所提供之標籤在該等標籤文集(the corpus of tags)中所使用的次數,論及該所給予之文件之標籤的總數量,相似之標籤文件對(tag document pairs)的數量,在相符之項目中的單字的數量,該文件被加入書籤的次數,適用於該文件之評價等級的數值以及數量。此外,對於一所提供之文件而言,任何標籤對於相關性的預測能力乃會正比於輸入該標籤之使用者的信心程度,若是其可以進行評估的話。在此,可以理解的是,依照本發明的相關性塑造程序係可以利用其他形式的分析以及其他的方法而加以執行,包括,但不限於,任何的統計分類、或排序回歸演算法,例如,邏輯回歸,支持向量機器(support vector machines)、分類或回歸樹(regression tree),或是提昇樹整體(boosted tree ensembles)。An analysis is performed on the label material previously discussed. In general, the relevance of any document provided will be a function of the factor for any given requirement, including, but not limited to, the number of labels containing the items in the request, any labels provided. The number of times used in the corpus of tags, the total number of tags for the document given, the number of similar tag document pairs, the words in the matching item The number of times the file has been bookmarked, the number and number of ratings for that file. In addition, for a document provided, the ability of any label to predict relevance is proportional to the level of confidence of the user entering the label, if it can be evaluated. Here, it can be understood that the correlation modeling program according to the present invention can be implemented by other forms of analysis as well as other methods, including, but not limited to, any statistical classification, or sorting regression algorithm, for example, Logistic regression, support vector machines, classification or regression trees, or boosted tree ensembles.

第4圖係為依照本發明的一流程圖,以舉例說明用於準備以及分析標籤資料的步驟400,以作為決定對應於要求之文件的相關性的準備。Figure 4 is a flow diagram in accordance with the present invention to illustrate the step 400 for preparing and analyzing tag data as a preparation for determining the relevance of a file corresponding to a request.

請參閱第4圖,在步驟410之中,標籤資料乃是該使用者透過一網路用戶端(Web client)以及一網路伺服器(Web server)所加以輸入、或者是藉由一標籤爬行程式(tag crawler)(例如,391,第3圖)所加以輸入,並且,藉由該系統(例如,步驟280,第2圖),乃會被呈送至一標籤資料庫(例如,340,第3圖),其中,該步驟410係可以不管第4圖所敘述之程序的其他剩餘步驟而連續地繼續進行,並持續一段時間。Referring to FIG. 4, in step 410, the tag data is input by the user through a web client and a web server, or is crawled by a tag. A tag crawler (eg, 391, Figure 3) is entered and, by the system (eg, step 280, Figure 2), is presented to a tag repository (eg, 340, 3)), wherein the step 410 can continue continuously regardless of the remaining steps of the program described in FIG. 4 for a period of time.

在步驟420之中,在該標籤資料(340,第3圖)之中的每一個標籤都會加以進行分析,然後,分析每一個標籤的程序繼續進行至步驟430,計算使用者信心度,以及步驟440,決定加權之標籤的數量。在步驟430之中,對每一個輸入的標籤而言,該輸入該標籤之使用者的信心程度乃會加以進行計算,且該標籤對其所關聯之文件的相關性的影響程度,乃會是該進行輸入之使用者的信心程度的一函數。In step 420, each of the tags in the tag data (340, Figure 3) is analyzed, and then the process of analyzing each tag proceeds to step 430, calculating user confidence, and steps. 440, determining the number of weighted labels. In step 430, for each input tag, the degree of confidence of the user who inputs the tag is calculated, and the degree of influence of the tag on the relevance of the associated file is A function of the degree of confidence of the user who entered the input.

該信心程度係可以利用一演算法、或是利用一使用者行為的統計模型而加以計算,包括,但不限於,以該使用者之行為有多符合使用者社群(user community)對該所給予之項目、或主題區域的行為作為基礎、或者是,以由於其他使用者藉由評估該使用者、將該使用者連接於一社會網路(social network)、或是加入、或認同(subscribing to)該使用者所輸入的該標籤而使該個使用者所具有的信心作為基礎,而加入書籤、產生標籤、點擊、排序、或是進行阻擋。舉例而言,若是一使用者,Luke,利用項目A而產生一所給予之文件X的標籤,且同時間其他使用者已經利用項目D以及F來作為該文件X之標籤時,則關聯於該文件X之標籤的完全集合就會是{a,d,f},其中,小寫符號“a”係代表利用項目“A”作為標籤的一實際狀況,繼續此實例,若是其他的兩個使用者,Simon以及Peter,利用該要求項目A而進行搜尋,且其每一個都阻擋了該文件X時,則現在,該文件X將會標籤為{a,-a,-a,d,f},所以,結果會是,Luke的信心程度將會因為複數個使用者不同意他的標籤而加以降低,並且,Simon以及Peter的信心程度將會因為他們的標籤一致於複數個使用者而獲得增加。在此,可以理解的是,也是有其他與本發明一致的方法可以用來決定一使用者的信心程度為何,不過,若是使用者為未知、或者該使用者的信心程度無法加以決定時,則將會一中立的、或是承接自獲得該標籤之來源的信心程度被分配至該標籤,此外,將可以理解的是,使用者信心乃可以在分析標籤的時時候、或是透過一些在其他時間所發生的程序,而進行計算。The degree of confidence can be calculated using an algorithm or using a statistical model of user behavior, including, but not limited to, how much the user's behavior is consistent with the user community. The behavior of the item, or subject area, as a basis, or because other users connect to the social network, join, or identify (subscribing) To) the user enters the tag to base the user with confidence, add bookmarks, generate tags, click, sort, or block. For example, if a user, Luke, uses item A to generate a label for the given file X, and at the same time other users have used items D and F as the label of the file X, then The complete set of tags for file X would be {a,d,f}, where the lowercase symbol "a" represents a real condition of using the item "A" as a label, continuing this example, if the other two users , Simon and Peter, using the request item A to search, and each of them blocks the file X, now the file X will be labeled {a,-a,-a,d,f}, So, the result will be that Luke's level of confidence will be reduced by the fact that multiple users disagree with his tags, and Simon and Peter's confidence will increase because their tags are consistent with multiple users. Here, it can be understood that there are other methods consistent with the present invention that can be used to determine the degree of confidence of a user, but if the user is unknown or the degree of confidence of the user cannot be determined, then The level of confidence that will be neutral or taken from the source of the label is assigned to the label. In addition, it will be understood that user confidence can be analyzed at the time of the label or through some other The program that takes place at the time is calculated.

在步驟440之中,乃會決定對每一個標籤而言,每一份文件、或文件群組的加權標籤數量,若是一文件X已經利用該項目A而被加入標籤n次時,則該文件X對於該項目A的該加權標籤數量就會是,作為因子而計入產生每一個標籤ai 之每一個使用者Ui 的信心程度,i=1至n,之中、以作為該文件X之參考的所有標籤a1 至an ,無論是正、或是負,的一聚集,此外,若是一使用者輸入許多的標籤時,則該使用者就可以被視為是會頻繁地對文件進行標籤,並且,任何該個使用者所給予之標籤的重要性就會被認為輕於較不頻繁進行標籤之使用者的標籤,另外,標籤則是會在它們被創造之時間點較早的情形下被視為重要性較輕,並且,若是它們是在比較接近當前的時間點被創造時,則重要性相對地會較重。在此,可以理解的是,其他可以在決定該加權標籤數量時進行考慮的因子也將會被視為是本發明的一部分,而一旦每一份文件對於每一個標籤的該加權標籤數量加以決定之後,整個程序就會繼續進行至步驟450。In step 440, the number of weighted labels for each document or group of files is determined for each label. If a file X has been added to the label n times using the item A, then the file The number of the weighted labels of X for the item A will be, as a factor, the degree of confidence of each user U i that generates each of the labels a i , i=1 to n, among them, as the file X All the labels a 1 to a n of the reference, whether it is a positive or negative aggregation, in addition, if a user inputs a lot of labels, the user can be regarded as frequently performing the file The label, and the importance of any label given by the user, is considered to be lighter than the label of the user who made the label less frequently, and the label is earlier than when it was created. The lower is considered to be less important, and if they are created at a point in time closer to the current time, the importance is relatively heavier. Here, it can be understood that other factors that can be considered in determining the number of weighted tags will also be considered as part of the present invention, and once each document is determined for the number of weighted tags for each tag. After that, the entire process will proceed to step 450.

在步驟450之中,在該文集之中的每一個項目乃會進行分析,以建立每一份文件、或文件群組對於每一個項目的標籤分數(tag scores),之後,分析每一個項目的程序繼續進行至步驟460,以分析每一份文件,以及步驟470,以計算每一份文件的一標籤分數。In step 450, each item in the corpus is analyzed to establish a tag score for each document, or group of files, for each item, and then analyze each item. The process continues to step 460 to analyze each document and step 470 to calculate a label score for each document.

在步驟460之中,會對下一個要進行標籤分數計算的文件進行分析,而先前所收集、或計算之所有有關該份文件的資訊則是會先聚集在一起,然後,完成額外的分析,舉例而言,可以加以考慮的因子包括,但不限於,該項目於該份文件之中的出現、或密度,該項目在該份文件之中的位置,多個項目在該個相同標籤之中的存在,以及該項目在指向該文件之超連結的該連結文字之中的存在,該標籤被創造、或是最新進行修正的時間,以及在該份文件之中該項目對其他項目的相似度,而此則是可以以統計分析法、叢集分析法(clustering analysis)、或是型態分析法(morphological analysis)作為基礎、或者是以其他決定相似度的其他形式分析法作為基礎。In step 460, the next file to be subjected to the calculation of the tag score is analyzed, and all the information previously collected or calculated about the file is first gathered together, and then an additional analysis is performed. For example, factors that can be considered include, but are not limited to, the occurrence, or density, of the item in the document, the location of the item in the document, and multiple items in the same label. The existence of the item, the presence of the item in the link to the hyperlink to the file, the time the label was created, or the latest revision, and the similarity of the item to other items in the document. This can be based on statistical analysis, clustering analysis, or morphological analysis, or other forms of analysis that determine similarity.

在步驟470之中,該當前文件對於該當前標籤的該標籤分數乃會進行計算。對於每一個文件的該標籤分數係是指向該份文件之標籤總數量的一函數,其中,每一個標籤乃會根據施加於其的權重而貢獻於該計算,此係為該加權標籤數量,正如在上述步驟440之中所決定的。此外,每一個標籤對於該標籤分數的貢獻乃會正比於,正如在步驟430中所決定的,對該輸入該標籤之使用者所評估的信心,以及在該步驟460中所完成之該文件的分析,另外,該份文件X對於該項目A的該標籤分數SA 乃會是,在該標籤資料庫中所出現之不同項目的總數(例如,340,第3圖),該項目A出現在該標籤資料庫中的頻率,該份文件X藉以加入標籤之不同項目的數量,相關聯於該份文件X的總數,以及利用該項目A而加入標籤之不同文件的數量,的一函數,而這些因子的組合則是會進行計算,因而使得每一份文件的該標籤分數會被分配一數值,在此,可以理解的是,其他可以進行計算的因子都可被視為是本發明的一部分。在根據本發明的一較佳實施例之中,該標籤分數乃是對個別是使用者、或是使用者群組所加以量身定做,而在本發明的一另一實施例之中,對於每份文件的該標籤分數則是對該系統的每一個使用者都是相同的,然後,該步驟470會回到步驟450,直到在該等文集中之該等文件的每一個已經都具有被分配給它之有關每一個項目的一標籤分數為止,接著,該程序會繼續進行至步驟480。In step 470, the current file's score for the current tag is calculated. The label score for each file is a function of the total number of labels pointing to the document, where each label contributes to the calculation based on the weight applied to it, which is the number of weighted labels, just as Determined in the above step 440. Moreover, the contribution of each tag to the tag score is proportional to, as determined in step 430, the confidence assessed by the user entering the tag, and the file completed in step 460. Analysis, in addition, the label score S A of the document X for the item A will be the total number of different items appearing in the label database (for example, 340, Fig. 3), the item A appears in The frequency in the tag database, the number of different items by which the document X is added, the total number of items X associated with the document, and the number of different files that are added to the tag using the item A, and The combination of these factors is calculated so that the label score for each document is assigned a value, and it is understood that other factors that can be calculated can be considered as part of the present invention. . In a preferred embodiment of the present invention, the tag score is tailored to the individual user or group of users, and in another embodiment of the present invention, The tag score for each document is the same for each user of the system, and then step 470 will return to step 450 until each of the files in the collection has been A label score for each item assigned to it, and then the program proceeds to step 480.

在步驟480之中,已經藉由一所給予之標籤而加入標籤的每一份文件係被編入索引,因此,每一份文件對於每一個項目的該標籤分數乃會加以記錄成可以在搜尋時簡單、且快速進行檢索的一格式,以決定有關於該要求項目之所有文件的相關性,正如藉由標籤分析所決定的,另外,文件以及其標籤分數的索引,標籤權重,以及使用者信心程度,則是可以發佈成為可以藉由一搜尋引擎而進行快速、且簡單之搜尋的一文件索引(例如,340,第3圖),以計算單獨藉由標籤分析、或是組合其他搜尋技術所加以決定之每一份文件對於該要求項目的相關性。在此,可以理解的是,在本發明其他的實施例之中,此程序400的某些步驟係可以省略,或是其他的步驟係可以被插入,或者,可以加以施加不同的權重、或是計算不同的標籤分數,也都仍然視為落在本發明的範圍之中。In step 480, each document that has been tagged by a given tag is indexed, so that the tag score for each item for each item is recorded as being searchable. A simple and fast search format to determine the relevance of all documents related to the required item, as determined by label analysis, in addition to the index of the document and its label scores, label weights, and user confidence The degree can be published as a file index (for example, 340, Figure 3) that can be quickly and easily searched by a search engine to calculate by tag analysis alone or in combination with other search technologies. The relevance of each document that is determined for the required project. Here, it can be understood that in other embodiments of the present invention, some steps of the program 400 may be omitted, or other steps may be inserted, or different weights may be applied, or It is still considered to fall within the scope of the present invention to calculate different label scores.

第5圖係為根據本發明之一實施例的一流程圖,以舉例說明利用標籤資料而計算結果的程序500。Figure 5 is a flow diagram illustrating a routine 500 for calculating results using tag data in accordance with an embodiment of the present invention.

請參閱第5圖,在步驟510之中,該搜尋引擎會對該可能會包含一、或多個項目的要求(例如,230,第2圖)進行處理。Referring to FIG. 5, in step 510, the search engine processes the request (eg, 230, FIG. 2) that may include one or more items.

在步驟520之中,該搜尋引擎係以在該要求中之每一個項目作為基礎,而產生一最相關於該要求之文件、或是文件群組的列表,接著,該以每一個項目作為基礎而用於決定該相關性的程序會繼續進行至步驟530,以辨識文件,540,以決定標籤排序,以及550,以為每一個文件的計分。在步驟530之中,為了在該要求之中的每一個項目,被視為具相關性之該等文件的一列表係以該些文件已經產生關聯之該等標籤作為基礎而加以產生,而此列表則是在長度範圍上可以自非常短(例如,5、或更少)到非常非常長(例如,10,000,000、或更多),在此,可以理解的是,此列表係可以為了在搜尋應用時的實際目的而加以截短,並且,根據一特別實施例的需求,其係可以加以分類、或不加以分類。In step 520, the search engine generates, based on each of the requirements, a list of files or groups of files most relevant to the request, and then based on each item. The program for determining the correlation will proceed to step 530 to identify the file, 540, to determine the label ordering, and 550, to score each file. In step 530, in order to be in each of the requirements, a list of such documents deemed relevant is generated based on the tags on which the files have been associated, and The list can be from very short (for example, 5 or less) to very very long (for example, 10,000,000, or more) in length, and it can be understood that this list can be used for searching applications. The actual purpose is truncated and, depending on the needs of a particular embodiment, may be classified or not.

在步驟540之中,每一份文件的該標籤分數乃是有關於一項目、或是項目的任何群組而加以決定,並且,每一份文件的該標籤分數將會是被分配至在該索引中之該個文件之該標籤分數的一函數,以及將會受到呈送該等用以計算該標籤分數之標籤的該等使用者的當前信心程度的影響,並且,係可以為了個別的使用者、或是某些群組中之會員的使用者而有所不同。In step 540, the label score for each document is determined with respect to an item, or any group of items, and the label score for each document will be assigned to a function of the tag score of the file in the index, and the extent of the current confidence of the users who will be presented with the tags for calculating the tag score, and may be for individual users Or the users of members in some groups vary.

在步驟550之中,以該當前正在進行考慮的要求項目作為基礎,每一份文件都可以利用會決定該份文件要加以放置在一結果列表之中之哪個位置的一數值而進行評分,之後,步驟550會回到步驟520,直到在該要求中的所有項目都獲得考慮為止。In step 550, based on the request item currently under consideration, each document may be scored using a value that determines which position in the result list the document is to be placed in a result list, after which Step 550 will return to step 520 until all items in the request are considered.

在步驟560之中,該等文件之每一個以在該要求中之該等項目作為基礎所得出的相關性分數乃會加以組合,以計算每一份文件相對於該所呈送之完整要求的整體相關分數,接下來,在步驟570之中,一排序結果列表乃會加以產生,並且,在步驟580之中,該個結果列表乃會被遞送至該搜尋引擎,而與所使用之無論為何的其他相關性方法相組合(例如,步驟240,第2圖),在此,可以理解的是,在本發明的其他實施例之中,此程序500的某些步驟係可以被省略、或是可以以一不同的順序而進行處理、或是其他的步驟可以插入,或者是,可以施加不同的權重、或是計算不同的標籤分數,都仍然被視為是落在本發明的範圍之中。In step 560, the relevance scores of each of the documents based on the items in the request are combined to calculate the overall requirements of each document relative to the complete requirements presented. Correlation scores, next, in step 570, a sorted result list is generated, and, in step 580, the result list is delivered to the search engine, regardless of what is used. Other correlation methods are combined (e.g., step 240, FIG. 2). It will be understood that in other embodiments of the invention, certain steps of the program 500 may be omitted or may be Processing in a different order, or other steps may be inserted, or different weights may be applied, or different label scores may be calculated, and are still considered to be within the scope of the present invention.

第6圖係舉例說明依據本發明,藉由一使用者610所使用之一網際網路搜尋應用程式系統600的硬體構件。該系統600係包括一客戶端裝置620,透過該網際網路630而耦接至一網路伺服器640,而該客戶端裝置620則可以是任何用以存取該網路伺服器640的裝置,並且,其乃是架構以利用網際網路協定,包括,但不限於,HTTP(the Hypertext Transfer Protocol,超文件傳輸通訊協定),以及WAP(Wireless Application Protocol,無線應用程式通訊協定),而進行溝通,較佳地是,該客戶端裝置620係為一個人電腦,但其係可以是其他的裝置,包括,但不限於,一手持式裝置,例如,一手機、或是個人數位數理(PDA),以及係能夠利用標準,例如,HTML(Hypertext Markup Language,超文本標記語言)、HDML(Handheld Device Markup Language,手持裝置標記語言)、WML(wireless markup language,無線標記語言)、或是類似者,而呈現資訊。Figure 6 illustrates the hardware components of an application system 600 searched by an Internet using a user 610 in accordance with the present invention. The system 600 includes a client device 620 coupled to a network server 640 via the Internet 630, and the client device 620 can be any device for accessing the network server 640. And, the architecture is to utilize the Internet Protocol, including, but not limited to, HTTP (the Hypertext Transfer Protocol) and WAP (Wireless Application Protocol). Preferably, the client device 620 is a personal computer, but it can be other devices, including, but not limited to, a handheld device, such as a mobile phone or a personal digital PDA. And can utilize standards such as HTML (Hypertext Markup Language), HDML (Handheld Device Markup Language), WML (Wireless Markup Language), or the like. And present information.

該網路伺服器640係耦接至一搜尋引擎650以及一標籤資料儲存660兩者,其中,該標籤資料儲存660係耦接至一標籤分析伺服器670,以及該搜尋伺服器650係耦接至一索引資料儲存680,額外地,該標籤分析伺服器670乃會耦接至該索引資料儲存680。The network server 640 is coupled to a search engine 650 and a tag data store 660. The tag data store 660 is coupled to a tag analysis server 670, and the search server 650 is coupled. To the index data store 680, additionally, the tag analysis server 670 is coupled to the index data store 680.

第7圖係顯示依照本發明的一實施例的一文件索引700。熟習此技藝之人將可以瞭解,該文件索引700乃是用以解釋本發明之該等方法的一概念上結構,並且,較佳的文件索引乃會使用反向索引(inverted indexes)。該文件索引700係包括示範性的第一以及第二列740以及750,而其每一列則是分別地會在欄705,710,715,720,以及725之中包含標籤-目標對以及相關的資訊,請參閱該列740,其中,該欄705係包含該標籤“U2”,該欄710包含一目標,在此為一指向一網頁(“U2 Home”)的超連結,該欄715包含一對該標籤-目標對(“U2-U2 Home”)而言的列(例如,未加權的)相關分數(95),該欄720包含對此標籤-目標對的一權重,以及該欄725係包括對於將該標籤U2相關聯於該目標“U2 Home”之使用者的一使用者信心評估。而該列750之中則也是包含類似的辨識資訊。在該欄720之中的輸入(0.6)係決定了在此標籤-目標對中給予該標籤“U2”的權重,而此權重則是可以決定自與其他因子,例如,該標籤相關聯於該目標的時間,相組合之位在欄725中的該使用者信心評估(0.7),進而決定一權重0.6,並且,此標籤-目標對的相關性分數乃是相等於該列相關性(95)分數乘上該權重(0.6),進而決定出最後相關分數,57,而在一類似的方法之中,位在該列750中之該標籤-目標對的該相關性分數則是會決定為70 * 0.9、或是63,因此,若是一使用者執行一包含該項目“U2”的搜尋要求時,該目標“Rock Band Home Site”,對應於該列750中的該目標,就會在該回覆(組織過)的結果列表中,被排序在比該列740中之該目標“U2”更高的位置,以表示其更相關於使用者的搜尋。Figure 7 shows a file index 700 in accordance with an embodiment of the present invention. Those skilled in the art will appreciate that the file index 700 is a conceptual construct for explaining the methods of the present invention, and that preferred file indexing will use inverted indexes. The file index 700 includes exemplary first and second columns 740 and 750, and each of the columns includes a tag-target pair and related information in columns 705, 710, 715, 720, and 725, respectively, see the column. 740, wherein the column 705 includes the tag "U2", the column 710 includes a target, here a hyperlink to a web page ("U2 Home"), the column 715 includes a pair of the tag-target pair ("U2-U2 Home") column (eg, unweighted) correlation score (95), the column 720 contains a weight for this tag-target pair, and the column 725 includes for the tag U2 A user confidence assessment associated with the user of the target "U2 Home." The column 750 also contains similar identification information. The input (0.6) in the column 720 determines the weight given to the tag "U2" in the tag-target pair, and the weight is determined to be associated with other factors, such as the tag. The target time, the user confidence assessment (0.7) in the column 725, and then a weight of 0.6, and the correlation score of the tag-target pair is equal to the column correlation (95) The score is multiplied by the weight (0.6) to determine the final correlation score, 57, and in a similar method, the relevance score for the label-target pair located in the column 750 is determined to be 70. * 0.9, or 63, therefore, if a user performs a search request containing the item "U2", the target "Rock Band Home Site", corresponding to the target in the column 750, will be in the reply The list of results (organized) is sorted at a higher position than the target "U2" in the column 740 to indicate that it is more relevant to the user's search.

可以理解的是,該文件索引700僅是作為舉例之用,輸入的不同組合、相關性分數的不同範圍、用於決定相關性分數的不同演算法(只是為了列舉一些不同的架構),也都可以加以使用。It can be understood that the file index 700 is only used as an example, different combinations of inputs, different ranges of relevance scores, different algorithms for determining relevance scores (just to list some different architectures), Can be used.

對於熟習此技藝之人而言,非常顯而易見的是,對於實施例的各式修飾將可以在不脫離本發明藉由所附申請專利範圍所定義之精神以及範疇的情形下而加以完成。It will be apparent to those skilled in the art that the various modifications of the embodiments can be made without departing from the spirit and scope of the invention as defined by the appended claims.

200...流程圖200. . . flow chart

300...構件300. . . member

400...流程圖400. . . flow chart

500...流程圖500. . . flow chart

600...硬體構件600. . . Hardware component

第1圖:其係顯示依照本發明的一實施例,一圖形使用者介面顯示一至少有部分是藉由分析標籤而衍生的搜尋結果列表的一示意圖例;第2圖:其係為舉例說明根據本發明的一實施例,一能夠將標籤應用於對文件之排序程序的網際網路搜尋應用程式的操作流程圖;第3圖:其係為舉例說明根據本發明的一實施例,一網際網路搜尋應用程式之組成構件的示意圖;第4圖:其係為舉例說明根據本發明的一實施例,用於準備以及分析標籤資料之程序的流程圖;第5圖:其係為舉例說明根據本發明的一實施例,利用標籤資料來計算結果之步驟的流程圖;第6圖:其係為舉例說明根據本發明的一實施例,一網際網路搜尋應用程式之組成構件的硬體圖式;以及第7圖:其係顯示根據本發明之一實施例的一文件索引。Figure 1 is a schematic diagram showing a graphical user interface displaying a list of search results derived at least in part by analyzing tags in accordance with an embodiment of the present invention; Figure 2 is an illustration of According to an embodiment of the present invention, an operational flowchart of an Internet search application capable of applying a label to a file sorting program; FIG. 3 is an illustration of an internet according to an embodiment of the present invention. Schematic diagram of the components of the web search application; FIG. 4 is a flow chart illustrating a procedure for preparing and analyzing tag data according to an embodiment of the present invention; FIG. 5 is an illustration A flowchart of a step of calculating a result using tag data according to an embodiment of the present invention; FIG. 6 is a block diagram illustrating hardware of a component of an Internet search application according to an embodiment of the present invention Figure 7 and Figure 7 show a file index in accordance with an embodiment of the present invention.

Claims (48)

一種決定多個目標對於一搜尋要求之相關性的方法,包括下列步驟:使該多個目標的其中之一或更多與標籤產生關聯而作為使用者輸入的一結果,藉以建立標籤-目標對,其中每一標籤包括文本;接收在一搜尋引擎的該搜尋要求;使用標籤與目標的該關聯來計算用於該搜尋要求之該多個目標之每一個的一相關性分數;以及回應於該搜尋要求而呈現一結果列表給一使用者,該結果列表包括該多個目標的其中之一或更多,其中該結果列表中該多個目標的順序由該相關性分數影響。 A method of determining the relevance of a plurality of targets to a search request, comprising the steps of: causing one or more of the plurality of targets to be associated with a tag as a result of user input, thereby establishing a tag-target pair Each of the tags includes text; receiving the search request at a search engine; using the association of the tag with the target to calculate a relevance score for each of the plurality of targets for the search request; and responding to the The search request presents a list of results to a user, the list of results including one or more of the plurality of targets, wherein the order of the plurality of targets in the list of results is affected by the relevance score. 如申請專利範圍第1項所述之方法,其中,來自該多個標籤的每一個標籤都包含一或多個項目,且該方法更包括使來自該一或多個項目的每一個項目與一目標產生關聯,藉此定義一、或多個相對應的項目-目標對。 The method of claim 1, wherein each of the tags from the plurality of tags comprises one or more items, and the method further comprises causing each item from the one or more items to be The target generates an association whereby one or more corresponding project-target pairs are defined. 如申請專利範圍第2項所述之方法,更包括,為每一個項目-目標對決定一項目分數,以指示該項目以及該目標之間的一相關性程度。 The method of claim 2, further comprising determining a project score for each project-target pair to indicate a degree of relevance between the project and the goal. 如申請專利範圍第3項所述之方法,其中,為一標籤-目標對計算一相關性分數係包括,為在該標籤中的各項目組合用於該等項目-目標對的該等項目分數。 The method of claim 3, wherein calculating a relevance score for a tag-target pair comprises combining the item scores for the item-target pairs for each item in the tag . 如申請專利範圍第4項所述之方法,其中,組合該等項目分數包括加總該等項目分數。 The method of claim 4, wherein combining the scores of the items comprises summing the scores of the items. 如申請專利範圍第4項所述之方法,其中,組合該等項目分數係包括以一權重加權每一個項目分數,以及加總該等已加權的項目分數。 The method of claim 4, wherein combining the item scores comprises weighting each item score by a weight, and summing the weighted item scores. 如申請專利範圍第2項所述之方法,其中,一用於一標籤-目標對的相關性分數是決定自:一項目在該已經與該目標產生關聯之標籤中的出現次數、與該目標關聯之標籤的數量、該標籤已經與多個目標產生關聯的次數、或是這些的任何組合。 The method of claim 2, wherein the relevance score for a tag-target pair is determined from: the number of occurrences of an item in the tag associated with the target, and the target The number of associated tags, the number of times the tag has been associated with multiple targets, or any combination of these. 如申請專利範圍第2項所述之方法,其中,對於一標籤-目標對的一相關性分數是決定自:包含該標籤中之一項目的標籤-目標對的數量、包含對該目標之一關聯的標籤-目標對的數量或是兩者。 The method of claim 2, wherein a relevance score for a tag-target pair is determined from: a number of tag-target pairs including one of the tags, including one of the targets Associated tags - the number of target pairs or both. 如申請專利範圍第1項所述之方法,其中:標籤與該多個目標之其中之一或更多的該關聯是藉由一第一使用者執行,其中該標籤包括一或更多項目;該搜尋要求包括該標籤中的該一或更多項目並且由一第二使用者執行;該結果列表被呈現給該第二使用者;以及該相關性分數由該第一使用者或該第一使用者所屬群組的信心的一程度影響。 The method of claim 1, wherein the association of the tag with one or more of the plurality of targets is performed by a first user, wherein the tag includes one or more items; The search request includes the one or more items in the tag and is executed by a second user; the result list is presented to the second user; and the relevance score is by the first user or the first A degree of confidence in the confidence of the group to which the user belongs. 如申請專利範圍第9項所述之方法,其中,對於一目標以及該搜尋要求的一相關性分數是決定自:出現在該目標之中、或相關聯於該目標的該搜尋要求之各項目的相關性分數。 The method of claim 9, wherein a relevance score for a target and the search request is determined from: items appearing in the target or associated with the search request for the target Correlation score. 如申請專利範圍第9項所述之方法,其中,來自該多個標籤-目標對之一標籤-目標對的一相關性分數乃是決定自:該第一使用者已使與該多個目標之任何目標產生關聯之標籤數量、該第一以及該第二使用者已使與標籤產生關聯之目標數量,或是兩者。 The method of claim 9, wherein a relevance score from the tag-target pair of the plurality of tag-target pairs is determined from: the first user has made the plurality of goals Any target generates the number of associated tags, the first and the number of targets that the second user has associated with the tag, or both. 如申請專利範圍第9項所述之方法,其中,信心的該程度是該第一使用者的一信心評估。 The method of claim 9, wherein the degree of confidence is a confidence assessment of the first user. 如申請專利範圍第12項所述之方法,其中,該信心評估乃是決定自:該選擇之使用者已使與目標產生關聯之標籤的一評估、該第一以及該第二使用者之搜尋活動之間的一相似性度量、在該第一以及該第二使者之間的一連結度量,或是這些個任意組合。 The method of claim 12, wherein the confidence assessment is determined from: an assessment of the user of the selection that has associated the target, a search for the first and the second user A measure of similarity between activities, a link metric between the first and the second messenger, or any combination of these. 如申請專利範圍第9項所述之方法,其中,信心的該程度是基於一使用者的產生標籤之行為有多符合其它使用者的產生標籤之行為。 The method of claim 9, wherein the degree of confidence is based on how much a user's tagging behavior conforms to other users' tagging behavior. 如申請專利範圍第9項所述之方法,更包括以一圖形元件標記該結果列表中該多個目標的至少其中之一。 The method of claim 9, further comprising marking at least one of the plurality of objects in the result list with a graphical component. 如申請專利範圍第1項所述之方法,其中,使該多個目標的其中之一或更多與標籤產生關聯係包括:在呈現於一使用者的一區域中輸入該標籤、評估該標籤、阻擋指向該目標的一連結、選擇該標籤或是選擇該目標。 The method of claim 1, wherein the associating one or more of the plurality of targets with the tag comprises: inputting the tag in an area presented to a user, evaluating the tag , block a link to the target, select the tag, or select the target. 如申請專利範圍第1項所述之方法,其中,使該多個目標的其中之一與標籤產生關聯係包括:分析執行一搜尋的一使用者所給予的輸入。 The method of claim 1, wherein causing one of the plurality of targets to be associated with the tag comprises: analyzing an input given by a user performing a search. 如申請專利範圍第1項所述之方法,其中,該多個目標係包括指向網頁的超連結、或是指向網頁的超連結群組。 The method of claim 1, wherein the plurality of targets include a hyperlink to a webpage or a hyperlink group pointing to a webpage. 如申請專利範圍第1項所述之方法,其中,該多個目標係包括:指向文字、影像、照片、標籤、標籤群組、主題區域、概念、使用者設定檔、回答、音訊檔案、視訊檔案、軟體、或是這些的任何組合。 The method of claim 1, wherein the plurality of target systems include: pointing to text, images, photos, tags, tag groups, subject areas, concepts, user profiles, answers, audio files, video File, software, or any combination of these. 如申請專利範圍第1項所述之方法,其中,一標籤爬行程式(tag crawler)可使該多個標籤的至少其中之一與該多個目標的至少其中之一產生關聯。 The method of claim 1, wherein a tag crawler associates at least one of the plurality of tags with at least one of the plurality of targets. 如申請專利範圍第1項所述之方法,其中將該結果列表呈現給該使用者的步驟包括將該結果列表中關聯於該多個目標其中之一或更多的該標籤顯示給該使用者。 The method of claim 1, wherein the step of presenting the result list to the user comprises displaying the tag associated with one or more of the plurality of targets in the result list to the user . 一種建立用於將目標呈現給一使用者的一系統的方法,包括下列步驟:在一標籤資料庫中儲存與多個目標產生關聯的多個標籤來作為使用者輸入的一結果;以及在一索引資料庫中儲存該多個標籤與該多個目標之間的相關性分數,其中,該等相關性分數用以在一經組織之結果列表中組織多個文件。 A method of establishing a system for presenting a goal to a user, comprising the steps of: storing a plurality of tags associated with a plurality of targets in a tag database as a result of user input; A score database stores a relevance score between the plurality of tags and the plurality of targets, wherein the relevance scores are used to organize a plurality of files in an organized result list. 如申請專利範圍第22項所述之方法,其中,在一標籤資料庫中儲存多個標籤係包括儲存來自該多個標籤的項目,以及其中,儲存相關性分數係包括評定指示該等項目與該等目標之間一相關性的相關性分數。 The method of claim 22, wherein storing a plurality of tags in a tag database comprises storing items from the plurality of tags, and wherein storing the relevance scores comprises rating the items and A correlation score for a correlation between these goals. 如申請專利範圍第23項所述之方法,其更包括在該索引資料庫中儲存多個索引,各索引對應於來自該等多個項目的一項目、來自多個目標的一相對應目標、以及在該項目以及該目標之間的一相對應相關性分數。 The method of claim 23, further comprising storing a plurality of indexes in the index database, each index corresponding to an item from the plurality of items, a corresponding target from the plurality of targets, And a corresponding relevance score between the project and the goal. 如申請專利範圍第24項所述之方法,其中,在一項目以及一目標之間的每一個相對應相關性分數由在一人或該人所屬群組的信心的一程度影響,該人進行該標籤與該目標之間的該關聯。 The method of claim 24, wherein each corresponding relevance score between an item and a target is affected by a degree of confidence of a person or a group to which the person belongs, the person performing the The association between the tag and the target. 如申請專利範圍第24項所述之方法,其中,在一項目以及一目標之間的一相關性分數是從一統計分類法、或是排序回歸演算法所決定。 The method of claim 24, wherein a relevance score between a project and a goal is determined by a statistical taxonomy or a sorting regression algorithm. 如申請專利範圍第26項所述之方法,其中,該統計分類法、或是排序回歸演算法係為邏輯回歸、支持向量機器(support vector machines)、分類或回歸樹(regression tree)以及提昇樹整體(boosted tree ensembles)的其中任一。 The method of claim 26, wherein the statistical classification method or the sorting regression algorithm is a logistic regression, a support vector machine, a classification or regression tree, and a lifting tree. Any of the boosted tree ensembles. 如申請專利範圍第22項所述之方法,更包括:將一結果列表呈現給一使用者,以回應一包含一項目的搜尋要求;藉由該使用者而使該項目與包含在該結果列表中的一目標產生關聯;以及決定該項目與該目標之間的一相關性分數。 The method of claim 22, further comprising: presenting a list of results to a user in response to a search request including a target; and the project is included in the result list by the user A target in the relationship is generated; and a relevance score between the item and the target is determined. 如申請專利範圍第28項所述之方法,其中,來自該多個目標的一目標以及與其相關聯標籤之間的一相關性分數乃是決定自下列的其中之一或多個:該標籤已與該目 標產生關聯的次數,與該目標產生關聯之標籤的總數量,該標籤已與該多個目標任何其中之一產生關聯的次數,已與所有該多個目標產生關聯之標籤的數量,該標籤與該目標產生關聯的時間,該目標已被加入書籤的次數,施加至該目標之評估的數值以及數量,或是這些的任何組合。 The method of claim 28, wherein a relevance score between a target from the plurality of targets and a tag associated therewith is determined from one or more of the following: the tag has been With the item The number of times the association is generated, the total number of labels associated with the target, the number of times the label has been associated with any of the multiple targets, the number of labels that have been associated with all of the multiple targets, the label The time associated with the target, the number of times the target has been bookmarked, the value and quantity applied to the target, or any combination of these. 一種用於組織多個目標以在一結果列表中進行顯示的方法,包括下列步驟:使在一搜尋要求中的項目與相關聯於多個目標的標籤產生關聯來作為使用者輸入的一結果;以及回覆一包含基於該等關聯性所組織之該多個目標的結果列表,每一關聯性對應一相關性度量。 A method for organizing a plurality of objects for display in a result list, comprising the steps of: correlating items in a search request with tags associated with a plurality of targets as a result of user input; And replying to a list of results comprising the plurality of goals organized based on the associations, each association corresponding to a correlation metric. 如申請專利範圍第30項所述之方法,更包括:執行該搜尋要求,藉此產生一第一目標列表;以及以該等關聯性作為基礎而組織該多個目標。 The method of claim 30, further comprising: performing the search request, thereby generating a first target list; and organizing the plurality of targets based on the associations. 如申請專利範圍第31項所述之方法,更包括使標籤與目標產生關聯。 The method of claim 31, further comprising associating the tag with the target. 如申請專利範圍第30項所述之方法,更包括施加一統計分類、或排序回歸演算法至該多個目標,以決定該多個目標與該搜尋要求之項目之間的相關性度量。 The method of claim 30, further comprising applying a statistical classification, or sorting a regression algorithm to the plurality of targets to determine a measure of correlation between the plurality of targets and the item of the search request. 如申請專利範圍第33項所述之方法,其中,該統計分類、或排序回歸演算法係為邏輯回歸、支持向量機器(support vector machines)、分類或回歸樹(regression tree)以及提昇樹整體(boosted tree ensembles)的其中任一。 The method of claim 33, wherein the statistical classification or the sorting regression algorithm is a logistic regression, a support vector machine, a classification or regression tree, and a lifting tree as a whole ( Boosted tree ensembles). 一種回應一搜尋要求而回覆一搜尋結果列表的系 統,包括:一標籤資料庫,用於儲存與目標相關聯的標籤,其中一或更多目標關聯於標籤而作為使用者輸入的一結果,且每一標籤包括文本;一標籤分析器,耦接至該標籤資料庫,其中,該標籤分析器係配置以使用標籤與目標的該關聯來計算用於該搜尋要求的該等目標之每一個的一相關性分數;以及一搜尋引擎,用以接收該搜尋要求並用於呈現一結果列表而回應該搜尋要求,該結果列表包括該等目標的其中之一或更多的參考,其中該結果列表中該多個目標的順序由該相關性分數影響。 A system that responds to a search request and responds to a list of search results The system includes: a tag database for storing a tag associated with the target, wherein one or more targets are associated with the tag as a result of the user input, and each tag includes text; a tag analyzer, coupled Connecting to the tag database, wherein the tag analyzer is configured to use the association of the tag with the target to calculate a relevance score for each of the targets for the search request; and a search engine for Receiving the search request and for presenting a list of results, the search result including a reference to one or more of the targets, wherein the order of the plurality of targets in the result list is affected by the relevance score . 如申請專利範圍第35項所述之系統,更包括一目標索引,以用於儲存與目標相關聯之標籤與目標之間、書籤與目標之間、或是兩者的相關性分數。 The system of claim 35, further comprising a target index for storing a relevance score between the tag and the target associated with the target, between the bookmark and the target, or both. 如申請專利範圍第35項所述之系統,其中,該相關性分數乃是藉由加總用於形成一標籤之項目以及目標的加權相關性分數而決定。 The system of claim 35, wherein the relevance score is determined by summing the items used to form a label and the weighted relevance score of the target. 如申請專利範圍第35項所述之系統,其中,一包含項目的搜尋要求與一目標之間的該相關性分數是決定自:包含在搜尋要求中之項目的標籤數量,該搜尋要求中所包含之一標籤其被包括在該標籤資料庫中的次數,已與該目標產生關聯之標籤數量,在該標籤中以及在該搜尋要求中之項目相符合的數量,該目標已經被加入書籤的次數,該目標已經被評估的次數,或是這些的任何組合。 The system of claim 35, wherein the relevance score between a search request including a project and a goal is determined by: the number of tags included in the search request, the search request The number of times a tag is included in the tag database, the number of tags associated with the target, the number of items in the tag and the items in the search request, the target has been bookmarked The number of times, the number of times the target has been evaluated, or any combination of these. 如申請專利範圍第35項所述之系統,其中,一標籤以及一目標的該相關性分數是基於:該標籤在該目標內的一位置,該標籤在該目標內的頻率,該標籤在該目標內的一密度,或是這些的任何組合。 The system of claim 35, wherein the relevance score of a tag and a target is based on: a position of the tag within the target, a frequency of the tag within the target, the tag is at the A density within the target, or any combination of these. 如申請專利範圍第35項所述之系統,其中,對於一標籤以及一目標間之該相關性分數的加權乃是基於被分配至一使用者的信心程度,該使用者使該標籤與該目標產生關聯。 The system of claim 35, wherein the weighting of the relevance score for a tag and a target is based on a degree of confidence assigned to a user who has the tag and the target Generate an association. 如申請專利範圍第35項所述之系統,其中,相關性分數是利用一統計分類、或排序回歸演算法、一叢集分析演算法(clustering analysis algorithm)、或是一型態分析法演算法(morphological analysis algorithm)而決定。 The system of claim 35, wherein the relevance score is a statistical classification, or a sorting regression algorithm, a clustering analysis algorithm, or a type analysis algorithm ( Morphological analysis algorithm). 如申請專利範圍第41項所述之系統,其中,該統計分類、或排序回歸演算法係包括邏輯回歸,支持向量機器(support vector machines)、分類或回歸樹(regression tree)以及提昇樹整體(boosted tree ensembles)的其中任一。 The system of claim 41, wherein the statistical classification or sorting regression algorithm comprises a logistic regression, a support vector machine, a classification or regression tree, and a lifting tree as a whole ( Boosted tree ensembles). 如申請專利範圍第36項所述之系統,更包括一搜尋引擎,其耦接至該目標索引,其中,該搜尋引擎係加以編程,以接收搜尋要求,該搜尋要求包含對應於標籤之項目,以及以標籤-目標對的相關性分數作為基礎而回覆一經組織的結果列表。 The system of claim 36, further comprising a search engine coupled to the target index, wherein the search engine is programmed to receive a search request, the search request including an item corresponding to the label, And replying to the list of results of the organization based on the relevance score of the tag-target pair. 如申請專利範圍第43項所述之系統,更包括一使用者資料庫,其耦接至該搜尋引擎,且該使用者資料庫包括與搜尋要求相關的資訊。 The system of claim 43, further comprising a user database coupled to the search engine, the user database including information related to the search request. 如申請專利範圍第44項所述之系統,其中,與搜尋要求相關的資訊係包括一使用者所追蹤的連結、相關聯於目標之標籤、一使用者所阻擋之目標、書籤、或是這些之任何組合。 The system of claim 44, wherein the information related to the search request includes a link tracked by the user, a tag associated with the target, a target blocked by the user, a bookmark, or the like. Any combination. 如申請專利範圍第35項所述之系統,其中,該等目標包括指向網頁的超連結、或是網頁群組的超連結。 The system of claim 35, wherein the target comprises a hyperlink to a webpage or a hyperlink to a webpage group. 如申請專利範圍第35項所述之系統,其中,該等目標包括指向文字、影像、照片、標籤、標籤群組、主題區域、概念、使用者設定檔(user profiles)、回答、音訊檔案、視訊檔案、軟體、或是這些的任何組合的超連結。 The system of claim 35, wherein the objectives include pointing to text, images, photos, tags, tag groups, subject areas, concepts, user profiles, answers, audio files, Video files, software, or hyperlinks to any combination of these. 如申請專利範圍第35項所述之系統,更包括一用於使一標籤與一目標產生關聯的裝置。 The system of claim 35, further comprising a means for associating a tag with a target.
TW95128551A 2005-08-03 2006-08-03 Systems for and methods of finding relevant documents by analyzing tags TWI391834B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US70570405P 2005-08-03 2005-08-03

Publications (2)

Publication Number Publication Date
TW200715152A TW200715152A (en) 2007-04-16
TWI391834B true TWI391834B (en) 2013-04-01

Family

ID=40014928

Family Applications (1)

Application Number Title Priority Date Filing Date
TW95128551A TWI391834B (en) 2005-08-03 2006-08-03 Systems for and methods of finding relevant documents by analyzing tags

Country Status (2)

Country Link
CN (1) CN101283353B (en)
TW (1) TWI391834B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI696081B (en) * 2018-01-08 2020-06-11 香港商阿里巴巴集團服務有限公司 Sample set processing method and device, sample query method and device

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7941391B2 (en) * 2007-05-04 2011-05-10 Microsoft Corporation Link spam detection using smooth classification function
CN101593187B (en) * 2008-05-30 2012-05-30 国际商业机器公司 Method and system for managing book marks
TWI411926B (en) * 2009-01-05 2013-10-11 Inventec Corp Generating dynamic web pages system and method thereof
US20120130999A1 (en) * 2009-08-24 2012-05-24 Jin jian ming Method and Apparatus for Searching Electronic Documents
US20120002884A1 (en) * 2010-06-30 2012-01-05 Alcatel-Lucent Usa Inc. Method and apparatus for managing video content
US8880517B2 (en) * 2011-02-18 2014-11-04 Microsoft Corporation Propagating signals across a web graph
US10402407B2 (en) 2013-06-17 2019-09-03 Lenovo (Singapore) Pte. Ltd. Contextual smart tags for content retrieval
US20150046418A1 (en) * 2013-08-09 2015-02-12 Microsoft Corporation Personalized content tagging
US10769229B2 (en) * 2016-04-14 2020-09-08 Microsoft Technology Licensing, Llc Separation of work and personal content
CN105956016A (en) * 2016-04-21 2016-09-21 成都数联铭品科技有限公司 Associated information visualization processing system
CN107463711B (en) * 2017-08-22 2020-07-28 山东浪潮云服务信息科技有限公司 Data tag matching method and device
US10733243B2 (en) * 2017-08-30 2020-08-04 Microsoft Technology Licensing, Llc Next generation similar profiles
CN109977318B (en) * 2019-04-04 2021-06-29 掌阅科技股份有限公司 Book searching method, electronic device and computer storage medium
CN110704624B (en) * 2019-09-30 2021-08-10 武汉大学 Geographic information service metadata text multi-level multi-label classification method
CN111125566B (en) * 2019-12-11 2021-08-31 贝壳找房(北京)科技有限公司 Information acquisition method and device, electronic equipment and storage medium
CN112100506B (en) * 2020-11-10 2021-03-16 中国电力科学研究院有限公司 Information pushing method, system, equipment and storage medium
CN116431686B (en) * 2023-06-08 2023-09-01 成都航空职业技术学院 Training data query method and system based on heterogeneous archives

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6070176A (en) * 1997-01-30 2000-05-30 Intel Corporation Method and apparatus for graphically representing portions of the world wide web
US20030078914A1 (en) * 2001-10-18 2003-04-24 Witbrock Michael J. Search results using editor feedback
US6601075B1 (en) * 2000-07-27 2003-07-29 International Business Machines Corporation System and method of ranking and retrieving documents based on authority scores of schemas and documents
US20040254917A1 (en) * 2003-06-13 2004-12-16 Brill Eric D. Architecture for generating responses to search engine queries

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6360215B1 (en) * 1998-11-03 2002-03-19 Inktomi Corporation Method and apparatus for retrieving documents based on information other than document content
US6718365B1 (en) * 2000-04-13 2004-04-06 International Business Machines Corporation Method, system, and program for ordering search results using an importance weighting
US6983280B2 (en) * 2002-09-13 2006-01-03 Overture Services Inc. Automated processing of appropriateness determination of content for search listings in wide area network searches

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6070176A (en) * 1997-01-30 2000-05-30 Intel Corporation Method and apparatus for graphically representing portions of the world wide web
US6601075B1 (en) * 2000-07-27 2003-07-29 International Business Machines Corporation System and method of ranking and retrieving documents based on authority scores of schemas and documents
US20030078914A1 (en) * 2001-10-18 2003-04-24 Witbrock Michael J. Search results using editor feedback
US20040254917A1 (en) * 2003-06-13 2004-12-16 Brill Eric D. Architecture for generating responses to search engine queries

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI696081B (en) * 2018-01-08 2020-06-11 香港商阿里巴巴集團服務有限公司 Sample set processing method and device, sample query method and device

Also Published As

Publication number Publication date
TW200715152A (en) 2007-04-16
CN101283353B (en) 2015-11-25
CN101283353A (en) 2008-10-08

Similar Documents

Publication Publication Date Title
US12001490B2 (en) Systems for and methods of finding relevant documents by analyzing tags
TWI391834B (en) Systems for and methods of finding relevant documents by analyzing tags
US8166013B2 (en) Method and system for crawling, mapping and extracting information associated with a business using heuristic and semantic analysis
RU2335013C2 (en) Methods and systems for improving search ranging with application of information about article
US6944609B2 (en) Search results using editor feedback
CN102822815B (en) For the method and system utilizing browser history to carry out action suggestion
JP5268073B2 (en) Bookmarking and ranking
US20060095430A1 (en) Web page ranking with hierarchical considerations
US7930287B2 (en) Systems and methods for compound searching
KR100671077B1 (en) Server, Method and System for Providing Information Search Service by Using Sheaf of Pages
JP2002539559A (en) Synergistic Internet bookmarks linking Internet search and hotlinks
Peng et al. Personalized web search using clickthrough data and web page rating
JP2009205588A (en) Page search system and program
KR101172487B1 (en) Method and system to provide search list and search keyword ranking based on information database attached to search result
JP2010282403A (en) Document retrieval method
Wu et al. A quality analysis of keyword searching in different search engines projects
Haseena et al. OPTIMIZATION OF WEB PAGE RANKING OF PERSONALIZED SEARCH USING IMPROVED BM25F MODEL
Cid et al. Automatic Maintenance ofWeb Directories using Click-Through Data
Balakrishnan et al. Moving beyond text highlights: inferring users' interests to improve the relevance of retrieval.
Tan et al. QueReSeek: Community-Based Web Navigation by Reverse Lookup of Search History
Jena Skill Development is A New Vista for Customising Information Retrieval Through Search Engine
Lakers et al. Search Engine Technology