TWI681304B - System and method for adaptively adjusting related search words - Google Patents
System and method for adaptively adjusting related search words Download PDFInfo
- Publication number
- TWI681304B TWI681304B TW107145181A TW107145181A TWI681304B TW I681304 B TWI681304 B TW I681304B TW 107145181 A TW107145181 A TW 107145181A TW 107145181 A TW107145181 A TW 107145181A TW I681304 B TWI681304 B TW I681304B
- Authority
- TW
- Taiwan
- Prior art keywords
- search
- word
- text
- related word
- threshold
- Prior art date
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本發明是有關於一種自適應性調整關連搜尋詞的系統及其方法。The invention relates to a system and method for adaptively adjusting related search words.
現代搜尋系統通常會在搜尋結果中同時回饋給用戶和搜尋詞相關的其它搜尋詞,用以協助用戶快速釐清查詢目標,其原因在於用戶使用的搜尋關鍵詞常常無法以簡短的詞彙精確地描述其搜尋意圖,或是用戶給定的搜尋詞或搜尋目標有多種描述方式或岐義性而造成用戶與文本間的詞彙用語不匹配,或是用戶對於搜尋目標事物的理解或知識不足而誤用錯誤的搜尋詞,又或是用戶打字錯誤如同音或近音等等因素。一般而言,關連搜尋詞的擷取技術可根據資料來源區分為基於索引文本內容的方法和基於歷史查詢記錄的方法。以文本為基礎的方法在搜尋系統上線前期,立即可以根據索引文本內容中詞彙之間的相關分析提供關連搜尋詞的建議清單,但其缺點是僅能根據固定的文本內容提供建議,無法根據後期累積的歷史查詢記錄分析預測用戶的搜尋意圖。雖然基於歷史查詢記錄的方法則可以根據不斷累積的用戶資料提供最新的搜尋意圖預測,從而得到更好的關連搜尋詞的建議清單,但卻無法在系統前期立刻提供建議,需要經過長時間的用戶使用,才能累積數量足夠的分析資料來源。習知方法中亦有利用權重整合方法結合上述兩種方法,使得不論在搜尋系統的前期沒有用戶歷史資料階段以或是後期累積足夠歷史資料階段,均可以推薦關連搜尋詞。Modern search systems usually give back to users other search terms related to the search term in the search results, to help users quickly clarify the query target, because the search keywords used by users often cannot accurately describe them in short vocabulary The search intention, or the user's given search term or search target has multiple descriptions or ambiguities that cause the vocabulary between the user and the text to not match, or the user's understanding or knowledge of the search target is insufficient and misused Search terms, or user typing errors such as sound or near sound and other factors. In general, the related search term extraction technology can be divided into a method based on index text content and a method based on historical query records according to the data source. The text-based method is used in the early stage of the search system, and it can immediately provide a list of related search terms based on the relevant analysis of the vocabulary in the index text content, but its disadvantage is that it can only provide suggestions based on the fixed text content, and cannot be based on the later stage. The accumulated historical query records analyze and predict the user's search intentions. Although the method based on historical query records can provide the latest search intent prediction based on the accumulated user data, so as to obtain a better list of related search terms, but it cannot provide suggestions immediately in the early stage of the system, and it takes a long time for users Use only in order to accumulate enough analysis data sources. In the conventional method, the weight integration method is also used to combine the above two methods, so that whether there is no user historical data in the early stage of the search system or when there is sufficient historical data in the later stage, related search terms can be recommended.
然而,權重整合方法同樣有權重組合的資料來源問題,人工設定往往無法達到最佳效果,通常需要累積足夠的搜尋記錄資料,才能以統計模型或機器學習方式訓練得到第一組最佳權重組合,並且仍有不同垂直領域的轉移學習的困難問題。因此, 上述擷取技術分別適用於不同上線時期的搜尋系統,由於搜尋記錄多寡不同,因而無法隨時提供適合建議用戶的關連搜尋詞,有必要提出改進之道。However, the weight integration method also has the problem of weighted combination of data sources. Manual setting often fails to achieve the best results. Usually, it is necessary to accumulate enough search record data to train the first set of optimal weight combinations by statistical model or machine learning. And there are still difficulties in transfer learning in different vertical fields. Therefore, the above-mentioned extraction techniques are respectively applicable to search systems in different online periods. Due to the different search records, it is not possible to provide related search terms suitable for suggesting users at any time.
本發明係有關於一種自適應性調整關連搜尋詞的系統及其方法,可根據系統累積的搜尋記錄的數量自我調整關連搜尋詞,以提供適合建議用戶的關連搜尋詞。The invention relates to a system and method for adaptively adjusting related search words, which can self-adjust related search words according to the number of search records accumulated by the system to provide related search words suitable for suggesting users.
根據本發明之一方面,提出一種自適應性調整關連搜尋詞的系統,包括一輸入裝置、一記錄蒐集模組、一門檻值設定模組以及一演化模組。輸入裝置用以接收用戶輸入並產出一搜尋詞。記錄蒐集模組用以判斷搜尋詞的累計搜尋次數是否大於一第一門檻值或小於一第二門檻值。門檻值設定模組用以設定滿足第一或第二門檻值的搜尋記錄的數量。演化模組用以根據搜尋記錄的數量多寡調整一搜尋流程,其中當搜尋詞的累計搜尋次數大於第一門檻值時,演化模組根據一歷史搜尋記錄找出與搜尋詞的內容或屬性相關的至少一歷史搜尋詞。當搜尋詞的累計搜尋次數小於第二門檻值時,演化模組執行一初期搜尋流程,以找出一文本中與搜尋詞的內容或屬性相關的至少一關連詞。當搜尋詞的累計搜尋次數介於第一門檻值與第二門檻值之間時,演化模組對中期搜尋流程進行優化,以進一步找出文本中及歷史搜尋記錄中與搜尋詞的內容或屬性相關最大化的至少一關連詞及/或至少一歷史搜尋詞。According to one aspect of the present invention, a system for adaptively adjusting related search terms is provided, including an input device, a record collection module, a threshold setting module, and an evolution module. The input device is used to receive user input and generate a search term. The record collection module is used to determine whether the accumulated search times of the search term is greater than a first threshold or less than a second threshold. The threshold setting module is used to set the number of search records that satisfy the first or second threshold. The evolution module is used to adjust a search process according to the number of search records. When the cumulative number of search terms for the search term is greater than the first threshold, the evolution module finds the content or attributes related to the search term based on a historical search record At least one historical search term. When the cumulative number of search terms for the search term is less than the second threshold, the evolution module performs an initial search process to find at least one related word in a text related to the content or attribute of the search term. When the cumulative number of search terms for the search term is between the first threshold and the second threshold, the evolution module optimizes the mid-term search process to further find the content or attributes of the search term in the text and historical search records At least one related word and/or at least one historical search word that maximizes relevance.
根據本發明之一方面,提出一種自適應性調整關連搜尋詞的方法,包括下列步驟。輸入流程用以接收用戶輸入並產出一搜尋詞。記錄蒐集流程用以判斷搜尋詞的累計搜尋次數是否大於一第一門檻值或小於一第二門檻值。門檻值設定流程用以設定滿足第一或第二門檻值的搜尋記錄的數量。演化流程,用以根據搜尋記錄的數量多寡調整一搜尋流程,其中當搜尋詞的累計搜尋次數大於第一門檻值時,演化流程根據一歷史搜尋記錄找出與搜尋詞的內容或屬性相關的至少一歷史搜尋詞。當搜尋詞的累計搜尋次數小於第二門檻值時,演化流程執行一初期搜尋流程,以找出一文本中與搜尋詞的內容或屬性相關的至少一關連詞。當搜尋詞的累計搜尋次數介於第一門檻值與第二門檻值之間時,演化流程對中期搜尋流程進行優化,以進一步找出文本中及歷史搜尋記錄中與搜尋詞的內容或屬性相關最大化的至少一關連詞及/或至少一歷史搜尋詞。According to one aspect of the present invention, a method for adaptively adjusting related search terms is proposed, including the following steps. The input process is used to receive user input and generate a search term. The record collection process is used to determine whether the cumulative number of search terms for a search term is greater than a first threshold or less than a second threshold. The threshold setting process is used to set the number of search records that satisfy the first or second threshold. The evolution process is used to adjust a search process according to the number of search records. When the cumulative number of search terms for the search term is greater than the first threshold, the evolution process finds at least the content or attribute related to the search term based on a historical search record A historical search term. When the cumulative search frequency of the search term is less than the second threshold, the evolution process performs an initial search process to find at least one related word in a text related to the content or attribute of the search term. When the cumulative number of search terms for the search term is between the first threshold and the second threshold, the evolution process optimizes the mid-term search process to further find the content or attributes related to the search term in the text and historical search records Maximized at least one related word and/or at least one historical search word.
為了對本發明之上述及其他方面有更佳的瞭解,下文特舉實施例,並配合所附圖式詳細說明如下:In order to have a better understanding of the above and other aspects of the present invention, the following examples are specifically described in conjunction with the accompanying drawings as follows:
以下係提出實施例進行詳細說明,實施例僅用以作為範例說明,並非用以限縮本發明欲保護之範圍。以下是以相同/類似的符號表示相同/類似的元件做說明。以下實施例中所提到的方向用語,例如:上、下、左、右、前或後等,僅是參考所附圖式的方向。因此,使用的方向用語是用來說明並非用來限制本發明。The following is an example for detailed description. The example is only used as an example, not intended to limit the scope of the present invention. The following description uses the same/similar symbols to indicate the same/similar components. Directional terms mentioned in the following embodiments, for example: up, down, left, right, front or back, etc., are only directions referring to the drawings. Therefore, the directional terminology is used to illustrate rather than limit the invention.
依照本發明之一實施例,提出一種自適應性調整關連搜尋詞的系統,例如是具有自我調整搜尋流程的搜尋引擎。對於初期導入本系統的搜尋引擎而言,在未累積足夠數量的搜尋記錄之前,本系統可在初期根據已建立索引的文本及索引詞表,比對出文本中與搜尋詞的文字內容或特徵屬性相關的至少一關連詞,以建立一初期的關連搜尋詞表。接著,在中期累積一定數量的搜尋記錄之後,本系統可根據一定數量的歷史搜尋記錄以及初期已建立索引的文本,比對出文本中及歷史搜尋記錄中與搜尋詞的內容或屬性相關最大化的至少一關連詞及/或至少一歷史搜尋詞,以建立一中期的關連搜尋詞表。之後,在後期累積足夠數量的搜尋記錄之後,本系統可直接根據用戶輸入的搜尋詞,找出與搜尋詞的內容或屬性相關的至少一歷史搜尋詞,以建立一後期的關連搜尋詞表。According to an embodiment of the present invention, a system for adaptively adjusting related search terms is provided, such as a search engine with a self-adjusting search process. For the search engine imported into the system in the early stage, before a sufficient number of search records have been accumulated, the system can compare the text content or characteristics of the search word in the text based on the indexed text and the index vocabulary in the initial stage At least one related word related to the attribute to establish an initial related search word list. Then, after accumulating a certain number of search records in the mid-term, the system can maximize the correlation between the content and attributes of the search term in the text and the historical search records based on a certain number of historical search records and texts that have been indexed in the early stage At least one related word and/or at least one historical search word to create a mid-term related search word list. After that, after accumulating a sufficient number of search records in the later period, the system can directly find at least one historical search term related to the content or attribute of the search term based on the search term input by the user, to establish a related search term list in the later period.
由上述可知,本系統可根據不同時期所累積的搜尋記錄的數量來達到自我優化的功能,使其演化模組可順利由前期無用戶行為記錄(搜尋記錄)的階段演進至後期以用戶行為記錄(搜尋記錄)為主的階段,進而提供適合建議用戶的關連搜尋詞。As can be seen from the above, the system can achieve self-optimization according to the number of search records accumulated in different periods, so that its evolution module can smoothly evolve from the stage of no user behavior records (search records) in the early stage to the later user behavior records (Search history) Main stage, and then provide related search terms suitable for suggesting users.
請參照第1圖,依照本發明之一實施例,自適應性調整關連搜尋詞的系統100包括一輸入裝置110、一記錄蒐集模組120、一門檻值設定模組130以及一演化模組140。輸入裝置110用以接收用戶輸入並產出一搜尋詞112。記錄蒐集模組120用以判斷搜尋詞112的累計搜尋次數是否大於一第一門檻值或小於一第二門檻值(以門檻值132表示)。門檻值設定模組130用以設定滿足第一或第二門檻值的搜尋記錄的數量。此外,演化模組140用以根據搜尋記錄的數量多寡調整一搜尋流程。Referring to FIG. 1, according to an embodiment of the present invention, a
在一實施例中,輸入裝置110可為一使用者介面,用以讀取用戶輸入的資料,包括文字、符號及/或語音等。以電腦或遠端伺服器為例,輸入裝置110可為連接至電腦或遠端伺服器的手持電子裝置,本發明不以此為限,輸入裝置110可將用戶欲檢索的搜尋詞112輸入至電腦或遠端伺服器中,再透過導入本系統100的搜尋引擎102尋找線上或本地文本資料庫的資料。資料庫可包含記錄資料庫124及文本資料庫126。文本資料庫126用以儲存欲搜尋的文本114的來源,包括文本檔案及/或資料庫欄位:文本檔案例如產品說明書檔案、廣告文案檔案、產品測試報告檔案、網頁檔案等;資料庫欄位例如商品資料庫的資料欄位,資料欄位例如商品名稱、關鍵字、商品描述、品牌等。記錄資料庫124用以儲存用戶的歷史搜尋記錄126。In one embodiment, the
記錄蒐集模組120用以蒐集用戶對本系統100之操作內容,包括輸入搜尋詞、點擊位置、點擊次數、瀏覽時間等資訊,以及各搜尋詞112的內容或屬性。記錄蒐集模組120將上述資料蒐集完成後即成為歷史搜尋記錄126,並進一步儲存至記錄資料庫124。搜尋詞112的內容或屬性可為產品中文名稱、英文名稱、簡稱、廠牌、型號、功能及其他廠牌的名稱等,本發明不以此為限,搜尋詞112的內容或屬性可根據辭典中的詞義或使用者自訂的語意或人工編輯的開放資料(如Wikipedia、DBpedia、Open Directory Project)或統計式專有名詞辨識(Name Entity Recognition)等方式來決定。當搜尋詞112的內容或屬性決定之後,本系統100再根據搜尋詞112的內容或屬性尋找相關的關連詞148。The
另外,本系統100還可透過搜尋引擎102對搜尋詞112的解析及語法重建,過濾文本114及/或歷史搜尋記錄126中與搜尋詞112的內容或屬性不相關的詞彙,以確保資料擷取的正確性與周延性。In addition, the
此外,門檻值設定模組130用以設定滿足第一或第二門檻值的搜尋記錄的數量。搜尋記錄的數量不限定為只有同一詞彙的搜尋詞112累積的數量,亦可為不同詞彙但語意相近的同一類型的搜尋詞112累積的數量。當不同用戶對於同一類型的搜尋詞112或相似的搜尋詞112進行搜尋,本系統100可對同一類型或相似的搜尋詞112的搜尋記錄進行累加或進行權重處理,當系統100累加的搜尋記錄的數量達到一門檻值132時,本系統100的演化模組140再根據搜尋記錄的數量多寡自適應性調整搜尋流程,如第2、3及4圖所示。In addition, the
請參照第1圖,本系統100更可包括一斷詞模組146、一記錄關連詞產生模組160以及一文本關連詞產生模組150。索引詞表144包含一組字串列表,每一字串可以由一至多個文數字或符號組成,索引詞表可經由人工預先設定,或是一般通用字典或專業領域字典,或是經由斷詞模組146分析文本114內容後,彙集所有字串詞組而成為索引詞表144,或可以是混合前述方式之組合,例如結合專業領域字典及文本經斷詞模組146分析後之所有詞彙。文本114的內容可以是文件、網頁或是資料庫的指定資料表或資料欄位,例如搜尋系統的標的若是商品,則文本114的內容可以是商品資料庫中商品資料表的商品名稱、商品描述、商品關鍵字等資料庫欄位,以及商品說明網頁內容。Referring to FIG. 1, the
斷詞模組146可將用戶輸入的搜尋詞112(例如中文字詞)分為有意義的詞組。例如:用戶輸入的搜尋詞112為晶片讀卡機,斷詞模組146可將晶片讀卡機分為晶片以及讀卡機,或者只有讀卡機。因此,當搜尋詞112不存在文本114中時,斷詞模組146根據索引詞表144進行字節解析、字詞解析或字詞比對等方式,將搜尋詞112拆解為至少一索引詞,以供搜尋引擎102進一步搜尋文本114中出現的索引詞。上述的中文字詞可採用基於辭典的斷詞算法、正向最大匹配算法、逆向最大匹配算法或雙向最大匹配算法、或以語料庫為基礎的統計斷詞算法如條件隨機場(Conditional Random Fields, CRF)或深度神經網路 (Deep Neural Networks, DNN)等進行分詞,本發明不以此為限。The word-breaking
此外,文本關連詞產生模組150可根據索引詞表144,用以分析文本114中與搜尋詞112最相關的前M個索引詞,以產生一文本關連詞表152。M例如為5個或大於5個的正整數。如上所述,在一實施例中,文本關連詞產生模組150可藉由搜尋詞112與索引詞單獨出現或共同出現在文本114中的機率計算一關連強度,關連強度越強,表示關連程度越強,反之,關連強度越弱,表示關連程度越差。上述的關連強度的計算可藉由關連關則學習法、逐點互信息演算法(Pointwise Mutual Information, PMI)、PMI改進演算法、KL散度演算法(Kullback–Leibler divergence)、標準化Google距離演算法、基於Wordnet距離的演算法來達成,本發明不以此為限。In addition, the text related
另外,記錄關連詞產生模組160,用以分析歷史搜尋記錄122中任兩個歷史搜尋詞之間的關連程度,找出與搜尋詞112最相關的前N個歷史搜尋詞,以產生一記錄關連詞表162。N例如為5個或大於5個的正整數。如上所述,在一實施例中,記錄關連詞產生模組160可藉由目前搜尋詞112與歷史搜尋詞的內容或屬性單獨出現或共同出現在歷史搜尋記錄122中的機率計算一關連強度,關連強度越強,表示關連程度越強,反之,關連強度越弱,表示關連程度越差。此外,關連程度除了比對詞彙內容出現位置之外,亦可以根據搜尋詞在歷史搜尋記錄122中的其它屬性,例如點擊位置、點擊次數、瀏覽時間等屬性計算關連程度,上述的關連強度的計算例如採用逐點互信息演算法(Pointwise Mutual Information, PMI),但亦可藉由其他演算法,例如關連關則學習法、PMI改進演算法、KL散度演算法(Kullback–Leibler divergence)、標準化Google距離演算法、基於wordnet距離的演算法來達成,本發明不以此為限。In addition, the record related
請參照第1圖,為了對中期搜尋流程進行優化,本系統100更包括一關連詞鑑別度計算模組170以及一關連詞推薦模組174。關連詞鑑別度計算模組170可根據文本114、索引詞表144、記錄關連詞表162以及文本關連詞表152計算各關連詞148的鑑別值172。鑑別值172是用以判斷關連詞148的獨特程度,也就是用以衡量關連詞148在文本114中差異程度的一種指標。並且可以用以增進關連詞表的多元化程度,避免推薦的關連詞過於雷同的問題。當關連詞148只出現在某一文本114中,鑑別值越高;當關連詞148同時出現在多個文本114中,鑑別值越低。例如,在多個文本114中,某一個關連詞148的獨特程度與關連詞148出現在此些文本114中的篇數的頻率(document frequency,簡稱DF)成反比的關係,即逆向文件頻率(inverse document frequency,簡稱IDF)。因此,關連詞鑑別度計算模組170可採用例如逆向文件頻率算法、殘餘逆向文件頻率(RIDF)算法或鑑別力算法(discrimination power),本發明不以此為限,來計算各關連詞148的鑑別值172,並建立關連詞148與鑑別值172的匹配表。Please refer to FIG. 1. In order to optimize the mid-term search process, the
在一實施例中,當某一個關連詞148存在於索引詞表144中,關連詞鑑別度計算模組170直接計算該索引詞的鑑別值。當某一個關連詞148不存在於索引詞表144中,斷詞模組146將某一個關連詞148進行分詞後,關連詞鑑別度計算模組170再針對分詞後的各索引詞計算鑑別值,再將該些鑑別值以取其中最小值、或最大值、或算術平均值、或加權平均值等方式估計該關連詞148的鑑別值。In one embodiment, when a certain
在一實施例中,本系統100更包含一新詞辨識模組142可從一給定詞彙中擷取出不包含在索引詞表中的新詞。新詞辨識模組142的計算方式可以透過語言規則如音韻規則或文法規則或構詞規則等方式,或是透過統計模型如隱藏式馬爾可夫模型 (Hidden Markov Model, HMM)、條件隨機場(Conditional Random Fields, CRF)、支持向量機(Support Vector Machine, SVM)、深度神經網路(Deep Neural Network, DNN),或是透過特定統計量如逐點互信息(Pointwise Mutual Information, PMI)演算法等方式計算。當某一個關連詞148不存在於索引詞表144中,新詞辨識模組142從該關連詞148中擷取出辨識為新詞之部分字串後,給予評估之鑑別值,新詞鑑別值的計算方式可以是一預先設定之固定數值,或是動態由索引詞表144中所有詞彙鑑別度之最大值或最大值之加權數值。而該關連詞中非新詞的字串部分則可繼續依據索引詞表144計算,若是存在於索引詞表144中,關連詞鑑別度計算模組170直接計算該索引詞的鑑別值。最後取得新詞與非新詞部分字組之鑑別值,再將該些鑑別值以取其中最小值、或最大值、或算術平均值、或加權平均值等方式估計該關連詞148的鑑別值。若是該非新詞的字串部分不存在於索引詞表144中,斷詞模組146將該字串進行分詞後得到至少一索引詞,關連詞鑑別度計算模組170再針對分詞後的各索引詞計算鑑別值,最後取得新詞與非新詞部分字組之鑑別值,再將該些鑑別值以取其中最小值、或最大值、或算術平均值、或加權平均值等方式估計該關連詞148的鑑別值。In one embodiment, the
此外,關連詞推薦模組174用以比較記錄關連詞表162中各關連詞148的鑑別值以及文本關連詞表152中各關連詞148的鑑別值,並根據各關聯詞148的鑑別值的排序,從文本關連詞表152及記錄關連詞表162中挑選鑑別值較高的前P個關連詞148。P例如是5個或大於5個的正整數。如此,即可完成適合建議的關連搜尋詞表176。In addition, the related
請參照第1及2圖,其中第2圖繪示依照本發明一實施例的自適應性調整關連搜尋詞176的系統100進行初期搜尋流程的示意圖,其包含步驟S11-S14。請參照步驟S11及S12,判斷搜尋詞112是否在搜尋記錄中,若有,進一步判斷搜尋詞112的累計搜尋次數是否小於第二門檻值。當符合上述兩個條件,演化模組140執行一初期搜尋流程,此時,由於搜尋詞112未存在於歷史搜尋記錄122中或搜尋詞112的累計搜尋次數非常少,因此搜尋引擎102無法根據目前的搜尋詞112找出適合建議的歷史搜尋詞。請參照步驟S13及S14,判斷搜尋詞112是否在一文本114中,若沒有,斷詞模組146根據索引詞表144將搜尋詞112拆解為至少一索引詞,並回到步驟S11中,進一步判斷索引詞是否在搜尋記錄中。當搜尋詞112存在一文本114中,文本關連詞產生模組150可根據內建的文本114及索引詞表144找出一文本114中與搜尋詞112的內容或屬性相關的至少一關連詞148。Please refer to FIGS. 1 and 2, wherein FIG. 2 is a schematic diagram of an initial search process performed by the
接著,請參照第1及3圖,其中第3圖繪示依照本發明一實施例的自適應性調整關連搜尋詞的系統100對中期搜尋流程進行優化的示意圖。本實施例之流程步驟與上述實施例相同,不同之處在於:在步驟S12中,當搜尋詞112的累計搜尋次數大於第二門檻值且小於第一門檻值時,系統100累積一定數量的搜尋記錄,可供演化模組140執行一中期搜尋流程。此時,記錄關連詞產生模組160可根據一歷史搜尋記錄122找出與搜尋詞112的內容或屬性相關的至少一歷史搜尋詞。因此,搜尋引擎102除了可根據目前的搜尋詞112找出適合建議的關連詞148之外,還可根據內建的文本114及索引詞表144找出適合建議的關連詞148,之後,再透過關連詞鑑別度計算模組170及新詞辨識模組142產生關連詞之鑑別值,再透過關連詞推薦模組174之挑選,進一步找出與搜尋詞112的內容或屬性相關最大化的至少一關連詞148及/或至少一歷史搜尋詞,用以取得最適化的搜尋關連詞表176。Next, please refer to FIGS. 1 and 3, wherein FIG. 3 shows a schematic diagram of the
接著,請參照第1及4圖,其中第4圖繪示依照本發明一實施例的自適應性調整關連搜尋詞的系統100進行後期搜尋流程的示意圖,其省略初期階段中步驟S13及S14的文本搜尋流程,僅進行步驟S11及S12之判斷步驟即可。在本實施例中,當搜尋詞112出現在歷史搜尋記錄122中,且搜尋詞112的累計搜尋次數大於第一門檻值且大於第二門檻值時,由於系統100已累積足夠數量的歷史搜尋記錄122,因此可供演化模組140執行一後期搜尋流程。此時,記錄關連詞產生模組160可根據一歷史搜尋記錄122找出與搜尋詞112的內容或屬性相關的至少一歷史搜尋詞。因此,搜尋引擎102不需根據內建的文本114及索引詞表144找出適合建議的關連詞148,而是直接根據目前的搜尋詞112從一歷史搜尋記錄122中找出適合建議的關連詞148。第一門檻值與第二門檻值為搜尋詞112的累計搜尋次數,可根據一般性統計大樣本數概念決定(樣本數大於30),或是根據相同領域與相似規模的搜尋系統進行決定,例如在購物搜尋領域,可以依據相似產品數量的案例中,達到用戶覺得滿意的記錄關連詞所需足夠之累計搜尋次數,用以設定第一與第二門檻值。或是可以在搜尋系統100使用過程中,由領域專家依據搜尋結果動態調整第一與第二門檻值,用以調整初期階段進化到後期階段的快慢程度,或由中期或後期階段退化回前一期階段。Next, please refer to FIGS. 1 and 4, wherein FIG. 4 illustrates a schematic diagram of the post-search process of the
在一實施例中,上述自適應性調整關連搜尋詞176的方法可以實作為一軟體程式,此軟體程式可儲存於非暫態電腦可讀取媒體(non-transitory computer readable medium),例如硬碟、光碟、隨身碟、記憶體等程式儲存裝置,當處理器從非暫態電腦可讀取媒體載入此軟體程式時,可執行如第2、3及4圖的方法流程,將一個初期搜尋流程進化為一中期搜尋流程,再由一中期搜尋流程進化為一後期搜尋流程。In one embodiment, the method for adaptively adjusting the
在一實施例中,自適應性調整關連搜尋詞的系統100可包括處理器及程式儲存裝置,處理器能夠執行一或多個電腦可執行指令,程式儲存裝置儲存可由處理器執行的電腦程式模組,其中電腦程式模組在由處理器執行時使處理器進行如第2、3、4圖所示各步驟的操作。In one embodiment, the
在另一實施例中,上述的記錄蒐集模組120、門檻值設定模組130、演化模組140、新詞辨識模組142、文本關連詞產生模組150、記錄關連詞產生模組160、關連詞鑑別度計算模組170、關連詞推薦模組174可以個別被實施為軟體單元或硬體單元,亦可以部分模組合併以軟體實施、部分模組合併以硬體實施。以軟體實施的模組,可視為一操作流程,即記錄蒐集流程、門檻值設定流程、演化流程、新詞辨識流程、文本關連詞產生流程、記錄關連詞產生流程、關連詞鑑別度計算流程、關連詞推薦流程等,可被處理器載入而執行對應的功能。以硬體實施的模組,例如可被實施為微控制單元(microcontroller)、微處理器(microprocessor)、數位訊號處理器(digital signal processor)、特殊應用積體電路(application specific integrated circuit,ASIC)、數位邏輯電路、或現場可程式邏輯閘陣列(field programmable gate array,FPGA)。In another embodiment, the aforementioned
本發明上述實施例所揭露之自適應性調整關連搜尋詞的系統及其方法,可根據系統累積的搜尋記錄的數量自我調整關連搜尋詞,以提供適合建議用戶的關連搜尋詞,因而能夠減少系統程式開發所需的人力以及時間成本,並且沒有需要預先學習第一組權重組合的問題,亦沒有垂直領域轉換學習的問題。此外,本發明同時亦考慮到搜尋詞推薦流程可以隨搜尋記錄變化而不斷演化的情形,建立正確率更高的搜尋詞推薦機制,如此能夠避免單一化搜尋詞推薦流程可能產生與搜尋詞的內容或屬性不相關的關連詞的問題,增加管理的便利性並提高使用彈性。The system and method for adaptively adjusting related search terms disclosed in the above embodiments of the present invention can self-adjust related search terms according to the number of accumulated search records of the system to provide related search terms suitable for suggesting users, thereby reducing the system The manpower and time cost required for program development, and there is no need to learn the first set of weight combinations in advance, and there is no problem of vertical field conversion learning. In addition, the present invention also considers the situation that the search term recommendation process can continue to evolve as the search record changes, and establishes a higher accuracy search term recommendation mechanism, which can avoid a single search term recommendation process that may generate content related to the search term Or the problem of related words whose attributes are not related, increase the convenience of management and improve the flexibility of use.
綜上所述,雖然本發明已以實施例揭露如上,然其並非用以限定本發明。本發明所屬技術領域中具有通常知識者,在不脫離本發明之精神和範圍內,當可作各種之更動與潤飾。因此,本發明之保護範圍當視後附之申請專利範圍所界定者為準。In summary, although the present invention has been disclosed as above with examples, it is not intended to limit the present invention. Those with ordinary knowledge in the technical field to which the present invention belongs can make various modifications and retouching without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be deemed as defined by the scope of the attached patent application.
100‧‧‧自適應性調整關連搜尋詞的系統
102‧‧‧搜尋引擎
110‧‧‧輸入裝置
112‧‧‧搜尋詞
114‧‧‧文本
120‧‧‧記錄蒐集模組
122‧‧‧歷史搜尋記錄
124‧‧‧記錄資料庫
126‧‧‧文本資料庫
130‧‧‧門檻值設定模組
132‧‧‧門檻值
140‧‧‧演化模組
142‧‧‧新詞辨識模組
144‧‧‧索引詞表
146‧‧‧斷詞模組
148‧‧‧關連詞
150‧‧‧文本關連詞產生模組
152‧‧‧文本關連詞表
160‧‧‧記錄關連詞產生模組
162‧‧‧記錄關連詞表
170‧‧‧關連詞鑑別度計算模組
172‧‧‧鑑別值
174‧‧‧關連詞推薦模組
176‧‧‧關連搜尋詞表100‧‧‧Adaptive system for adjusting
第1圖繪示依照本發明一實施例的自適應性調整關連搜尋詞的系統的示意圖。 第2圖繪示依照本發明一實施例的自適應性調整關連搜尋詞的系統進行初期搜尋流程的示意圖。 第3圖繪示依照本發明一實施例的自適應性調整關連搜尋詞的系統對中期搜尋流程進行優化的示意圖。 第4圖繪示依照本發明一實施例的自適應性調整關連搜尋詞的系統進行後期搜尋流程的示意圖。FIG. 1 is a schematic diagram of a system for adaptively adjusting related search terms according to an embodiment of the invention. FIG. 2 is a schematic diagram of an initial search process of a system for adaptively adjusting related search terms according to an embodiment of the invention. FIG. 3 is a schematic diagram of the system for adaptively adjusting related search terms according to an embodiment of the present invention to optimize the mid-term search process. FIG. 4 is a schematic diagram of the post-search process of the system for adaptively adjusting related search terms according to an embodiment of the invention.
100‧‧‧自適應性調整關連搜尋詞的系統 100‧‧‧Adaptive system for adjusting related search words
102‧‧‧搜尋引擎 102‧‧‧ search engine
110‧‧‧輸入裝置 110‧‧‧Input device
112‧‧‧搜尋詞 112‧‧‧ search terms
114‧‧‧文本 114‧‧‧ text
120‧‧‧記錄蒐集模組 120‧‧‧Record collection module
122‧‧‧歷史搜尋記錄 122‧‧‧History search record
124‧‧‧記錄資料庫 124‧‧‧Record database
126‧‧‧文本資料庫 126‧‧‧ Text database
130‧‧‧門檻值設定模組 130‧‧‧ Threshold value setting module
132‧‧‧門檻值 132‧‧‧ Threshold
140‧‧‧演化模組 140‧‧‧Evolution module
142‧‧‧新詞辨識模組 142‧‧‧New word recognition module
144‧‧‧索引詞表 144‧‧‧ Index word list
146‧‧‧斷詞模組 146‧‧‧ Word Breaking Module
148‧‧‧關連詞 148‧‧‧ related words
150‧‧‧文本關連詞產生模組 150‧‧‧ text related word generation module
152‧‧‧文本關連詞表 152‧‧‧ List of related words
160‧‧‧記錄關連詞產生模組 160‧‧‧Record related word generation module
162‧‧‧記錄關連詞表 162‧‧‧List of related words
170‧‧‧關連詞鑑別度計算模組 170‧‧‧ related word discrimination calculation module
172‧‧‧鑑別值 172‧‧‧discrimination value
174‧‧‧關連詞推薦模組 174‧‧‧Related Links Recommendation Module
176‧‧‧關連搜尋詞表 176‧‧‧ related search vocabulary
Claims (18)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW107145181A TWI681304B (en) | 2018-12-14 | 2018-12-14 | System and method for adaptively adjusting related search words |
CN201910088844.9A CN111324705B (en) | 2018-12-14 | 2019-01-29 | System and method for adaptively adjusting associated search terms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW107145181A TWI681304B (en) | 2018-12-14 | 2018-12-14 | System and method for adaptively adjusting related search words |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI681304B true TWI681304B (en) | 2020-01-01 |
TW202022635A TW202022635A (en) | 2020-06-16 |
Family
ID=69942676
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW107145181A TWI681304B (en) | 2018-12-14 | 2018-12-14 | System and method for adaptively adjusting related search words |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111324705B (en) |
TW (1) | TWI681304B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI755995B (en) * | 2020-12-24 | 2022-02-21 | 科智企業股份有限公司 | A method and a system for screening engineering data to obtain features, a method for screening engineering data repeatedly to obtain features, a method for generating predictive models, and a system for characterizing engineering data online |
TWI787651B (en) * | 2020-09-16 | 2022-12-21 | 洽吧智能股份有限公司 | Method and system for labeling text segment |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI742446B (en) * | 2019-10-08 | 2021-10-11 | 東方線上股份有限公司 | Vocabulary library extension system and method thereof |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW200921422A (en) * | 2007-07-31 | 2009-05-16 | Yahoo Inc | System and method for determining semantically related terms |
US20100262603A1 (en) * | 2002-02-26 | 2010-10-14 | Odom Paul S | Search engine methods and systems for displaying relevant topics |
CN102184173A (en) * | 2009-10-31 | 2011-09-14 | 佛山市顺德区汉达精密电子科技有限公司 | Method for searching Internet data |
CN102629257A (en) * | 2012-02-29 | 2012-08-08 | 南京大学 | Commodity recommending method of e-commerce website based on keywords |
CN103077179A (en) * | 2011-09-12 | 2013-05-01 | 吉菲斯股份有限公司 | A computer-implemented method for displaying an individual timeline of a user of a social network, computer system and computer readable medium thereof |
TWI421713B (en) * | 2008-02-28 | 2014-01-01 | Yahoo Inc | System and/or method for personalization of searches |
CN105930376A (en) * | 2016-04-12 | 2016-09-07 | 广东欧珀移动通信有限公司 | Search method and device |
US20170169111A1 (en) * | 2015-12-09 | 2017-06-15 | Oracle International Corporation | Search query task management for search system tuning |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103365839B (en) * | 2012-03-26 | 2017-12-12 | 深圳市世纪光速信息技术有限公司 | The recommendation searching method and device of a kind of search engine |
GB201418402D0 (en) * | 2014-10-16 | 2014-12-03 | Touchtype Ltd | Text prediction integration |
CN105653533B (en) * | 2014-11-13 | 2019-10-25 | 腾讯数码(深圳)有限公司 | A kind of method and apparatus updating classification associated set of words |
CN106649334B (en) * | 2015-10-29 | 2020-09-15 | 北京国双科技有限公司 | Processing method and device of associated word set |
-
2018
- 2018-12-14 TW TW107145181A patent/TWI681304B/en active
-
2019
- 2019-01-29 CN CN201910088844.9A patent/CN111324705B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100262603A1 (en) * | 2002-02-26 | 2010-10-14 | Odom Paul S | Search engine methods and systems for displaying relevant topics |
TW200921422A (en) * | 2007-07-31 | 2009-05-16 | Yahoo Inc | System and method for determining semantically related terms |
TWI421713B (en) * | 2008-02-28 | 2014-01-01 | Yahoo Inc | System and/or method for personalization of searches |
CN102184173A (en) * | 2009-10-31 | 2011-09-14 | 佛山市顺德区汉达精密电子科技有限公司 | Method for searching Internet data |
CN103077179A (en) * | 2011-09-12 | 2013-05-01 | 吉菲斯股份有限公司 | A computer-implemented method for displaying an individual timeline of a user of a social network, computer system and computer readable medium thereof |
CN102629257A (en) * | 2012-02-29 | 2012-08-08 | 南京大学 | Commodity recommending method of e-commerce website based on keywords |
US20170169111A1 (en) * | 2015-12-09 | 2017-06-15 | Oracle International Corporation | Search query task management for search system tuning |
CN105930376A (en) * | 2016-04-12 | 2016-09-07 | 广东欧珀移动通信有限公司 | Search method and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI787651B (en) * | 2020-09-16 | 2022-12-21 | 洽吧智能股份有限公司 | Method and system for labeling text segment |
TWI755995B (en) * | 2020-12-24 | 2022-02-21 | 科智企業股份有限公司 | A method and a system for screening engineering data to obtain features, a method for screening engineering data repeatedly to obtain features, a method for generating predictive models, and a system for characterizing engineering data online |
Also Published As
Publication number | Publication date |
---|---|
CN111324705A (en) | 2020-06-23 |
TW202022635A (en) | 2020-06-16 |
CN111324705B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11544459B2 (en) | Method and apparatus for determining feature words and server | |
WO2020244073A1 (en) | Speech-based user classification method and device, computer apparatus, and storage medium | |
US20180300315A1 (en) | Systems and methods for document processing using machine learning | |
CN106874441B (en) | Intelligent question-answering method and device | |
US9361362B1 (en) | Synonym generation using online decompounding and transitivity | |
US8751218B2 (en) | Indexing content at semantic level | |
RU2517368C2 (en) | Method and apparatus for determining and evaluating significance of words | |
JP6335898B2 (en) | Information classification based on product recognition | |
WO2021068683A1 (en) | Method and apparatus for generating regular expression, server, and computer-readable storage medium | |
TWI681304B (en) | System and method for adaptively adjusting related search words | |
KR20060045786A (en) | Verifying relevance between keywords and web site contents | |
CN104967558B (en) | A kind of detection method and device of spam | |
US11790174B2 (en) | Entity recognition method and apparatus | |
US11526512B1 (en) | Rewriting queries | |
WO2024109619A1 (en) | Sensitive data identification method and apparatus, device, and computer storage medium | |
US20150006563A1 (en) | Transitive Synonym Creation | |
WO2017091985A1 (en) | Method and device for recognizing stop word | |
US11720481B2 (en) | Method, apparatus and computer program product for predictive configuration management of a software testing system | |
CN113688954A (en) | Method, system, equipment and storage medium for calculating text similarity | |
Gacitua et al. | Relevance-based abstraction identification: technique and evaluation | |
WO2023065642A1 (en) | Corpus screening method, intention recognition model optimization method, device, and storage medium | |
CN108536665A (en) | A kind of method and device of determining sentence consistency | |
CN112579729A (en) | Training method and device for document quality evaluation model, electronic equipment and medium | |
US9183297B1 (en) | Method and apparatus for generating lexical synonyms for query terms | |
CN114202443A (en) | Policy classification method, device, equipment and storage medium |