TW201131395A - Method and apparatus for carrying out sorting the search result - Google Patents

Method and apparatus for carrying out sorting the search result Download PDF

Info

Publication number
TW201131395A
TW201131395A TW99106782A TW99106782A TW201131395A TW 201131395 A TW201131395 A TW 201131395A TW 99106782 A TW99106782 A TW 99106782A TW 99106782 A TW99106782 A TW 99106782A TW 201131395 A TW201131395 A TW 201131395A
Authority
TW
Taiwan
Prior art keywords
word
weight
string
target
query
Prior art date
Application number
TW99106782A
Other languages
Chinese (zh)
Other versions
TWI486797B (en
Inventor
yu-heng Xie
Fei Xing
Ning Guo
Lei Hou
Qin Zhang
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to TW099106782A priority Critical patent/TWI486797B/en
Publication of TW201131395A publication Critical patent/TW201131395A/en
Application granted granted Critical
Publication of TWI486797B publication Critical patent/TWI486797B/en

Links

Abstract

The present application case discloses a method and an apparatus for carrying out sorting the search result, wherein the method includes: the server calculates in advance the semantics correlation weight between every two phrases in the statistic sample, and obtains and stores the phrase weighing table. The method also includes: the server receives the inquiry character string inputted by the user terminal, and carries out the search and obtains the target character string according to the inquiry character string; the server separately carries out phrase segmentation for the inquiry and target character strings and sequentially combines each of the segmented phrases of the inquiry character string with the segmented phrase of the target character string; the phrase weighing table is looked up to obtain the weighing value of each combined sub-phrase; the weighed phrase length can be obtained according to the weighing value, and the weighed phrase length is based on to carry out sorting for each target character string and feed back to the user terminal. The present application case, through introduction of the phrase weighing of the semantics correlation weight representing the inquiry and target character strings, can reflect more accurately the matching degree of the target and inquiry character strings, thereby achieving easy application in practice and excellent efficacy.

Description

201131395 六、發明說明: 【發明所屬之技術領域】 本申請案係有關電腦資料處理技術領域,特別是指一 種對搜索結果進行排序的方法和裝置。 1J 術 技 前 先 在搜索引擎中,需要根據查詢字串的幾個詞在檢索結 果(目標字串)中出現的位置距離來估計檢索結果與查詢 字串的匹配程度,距離近的通常具有更高的匹配程度,因 而獲得更加靠前的排名。例如查詢字串是“消毒機”,包 含“消毒機”的檢索結果通常比“消毒工業洗衣機”更接 近用戶的意圖,而後者又比“消毒設備、脫水器、烘乾機 ’’更接近用戶的意圖,這都將影響檢索結果的排名。 計算查詢字串的多個詞語在目標字串中的距離的-種 習知實現方式是最小滑動視窗,亦即,在目標字串中尋找 —個長度儘量小的區間,該區間中包含查詢字串的每一個 字和詞’用這個區間的長度來描述查詢詞語在目標字串中 的遠近。例如査詢字串是“我丨看丨風景”,目標字串是“ 我丨在丨橋丨上丨看丨風景卜|看丨風景丨的I人丨在I橋丨下丨看I我卜” (豎線代表分詞結果)則最小滑動窗口是“我I在I橋I上丨看 丨風景”,長度爲6個詞語。 另一種計算詞語長度的方法是編輯距離,跟最小滑動 窗口不一樣的是,它並不是計算單—字串的詞語長度,而 是計算兩個字串間的差異部分的長度之和。例如“我和你’ 201131395 和“大和小”差異部分共兩個詞(第一和第三個詞),編輯 距離爲2。 目前’通常是根據長度或距離確定查詢字串和目標字 串的匹配程度’也就是說,如果最小滑動窗口長度或編輯 距離越小,則匹配程度越高,反之則匹配程度低。 然而在某些情況下,簡單的長度或距離並不能準確地 反映匹配程度。例如査詢字串是“諾基亞電池”,檢索結果 A是“諾基亞電池” ’ B是“諾基亞手機,贈送電池”,c是“諾 基亞n73手機原裝電池”。按照簡單的距離計算,A的“諾基 亞”和“電池”之間的距離爲0,匹配程度最好;B和C的“諾 基亞”和“電池”之間的距離都是3個詞,匹配程度都不夠好 。但是實際上C的“n73手機”是跟“諾基亞”強烈相關的詞語 ,“原裝”也是跟“電池”強烈相關的詞語,雖然中間都是間 隔了 3個詞,但是C的匹配程度比B高很多。 考慮不同詞語在距離計算上的不同影響,前人已有一 些硏究,例如可以根據詞性(POS )來設定詞語權重。但 是這種根據詞性來設定權重的方法,仍舊過於簡單,沒有 涉及一個本質問題,就是査詢字串和目標字串語義是否相 關,因而得到的長度或距離不能準確地反映出查詢字串和 目標字串的匹配程度,亦即,不能保證和查詢字串語義相 關的目標字串被排在前面。 【發明內容】 本申請案提供一種對搜索結果進行排序的方法和裝置 -6 - 201131395 ,透過查詢字串和目標字串的語義關聯度,能夠更準確地 對目標字串進行排序,反映出各目標字串與査詢字串的匹 配程度。 本申請案提供了一種對搜索結果進行排序的方法,包 括:伺服器預先計算統計樣本中每雨個詞語之間的語義關 聯權重,獲得並保存詞語權重表,所述方法還包括: 伺服器接收用戶終端輸入的查詢字串,根據查詢字串 而進行搜索並獲得目標字串; 伺服器對所述査詢字串和目標字串分別進行分詞,將 查詢字串的各分詞依次與目標字串的分詞兩兩組合; 查詢詞語權重表,獲得每個分片語合的權重値;及 根據所述權重値而獲得加權詞語長度,根據所述加權 詞語長度而對每個目標字串進行排序,並反饋給用戶終端 0 其中’所述伺服器預先計算統計樣本中每兩個詞語之 間的語義關聯權重,獲得詞語權重表的步驟包括: 伺服器獲取統計樣本; 從所述統計樣本中選取第一詞語和第二詞語,統計所 述第一詞語和第二詞語在統計樣本中共同出現的次數c ( 第一詞語,第二詞語); 統計第二詞語在統計樣本中出現的次數ZC ( Yi,第二 詞語)’其中’所述Yi代表每個跟第二詞語共同出現的詞 語; 計算所述第一詞語在第二詞語出現條件下的槪率P ( 201131395 第一詞語丨第二詞語)=C (第一詞語,第二詞語)/EC ( Yi ,第二詞語.); 在査詢第二詞語時,取第一詞語與第二詞語的語義相 關權重爲W=l-P,其中,所述W爲權重,所述P爲第一詞語 在第二詞語出現條件下的槪率;及 重複上述步驟,依次獲得所述統計樣本中每個詞語相 對其他詞語的語義相關權重,獲得到詞語權重表。 其中,所述統計樣本的來源包括任何形式的文本或符 號,所述文本包括網頁文本、用戶搜索日誌、及用戶點擊 曰誌。 其中,所述加權詞語長度爲最小滑動窗口加權長度; 根據所述權重値而獲得加權詞語長度對每個目標字串 進行排序的步驟包括: 分別取目標字串的各個分詞在査詢字串各分詞的權重 最小値;或者,分別取查詢字串的各個分詞在目標字串各 分詞的權重最小値; 對各個目標字串,根據所述權重最小値分別計算最小 滑動窗口加權長度;及 比較各目標字串的最小滑動視窗加權長度,長度小則 排序在前,反之,排序在後。 其中,計算每個目標字串的最小滑動視窗加權長度具 體包括: 最小滑動窗口加權長度έπ = , i=k l=k J=] 其中,W表示權重,Ti表示目標字串中的第i個的分詞 -8 - 201131395 ,k、h分別表示目標字串最小滑動視窗的起始位置和結束 位置,Qj表示查詢字串中的第j個分詞,m表示查詢字串分 詞的個數。 本申請案還提供了 一種對搜索結果進行排序的方法, 伺服器預先計算統計樣本中每兩個詞語之間的語義關聯權 重,獲得並保存詞語權重表’所述方法還包括: 伺服器接收用戶終端輸入的查詢字串,根據查詢字串 而進行搜索並獲得目標字串; 伺服器對所述査詢字串和目標字串分別進行分詞; 伺服器根據所述存詞語權重表,計算插入的詞語相對 查詢字串各分詞的權重最小値; 伺服器根據所述存詞語權重表’計算刪除的詞語相對 目標字串各分詞的權重最小値;及 根據所述權重最小値計算總的編輯距離’根據所述總 的編輯距離對每個目標字串進行排序’並反饋給用戶終端 〇 其中,所述根據所述詞語權重表,計算插入的詞語相 對查詢字串各分詞的權重最小値的步驟包括- 根據詞語權重表’獲得插入的詞語相對査詢字串各分 詞的權重値;及 計算插入的詞語相對查詢字串各分詞的權重最小値爲 Σ = Σ minw^,込) i=l /=1 7=1 其中,W表示權重’ It表示插入字串中的第t個的分詞 ,η分別表示插入分詞的個數’ Qj表示查詢字串中的第j個 -9 - 201131395 分詞,m表示査詢字串分詞的個數。 其中,所述根據所述詞語權重表,計算刪除的詞語相 對目標字串各分詞的權重最小値的步驟包括: 根據詞語權重表,獲得刪除的詞語相對目標字串各分 詞的權重値; 計算刪除的詞語相對目標字串各分詞的權重最小値爲 </-l d=\ 1=1 其中,W表示權重,Ti表示目標字串中的第i個的分詞 ,q表示目標字串分詞的個數,Dd表示刪除詞語中的第d個 分詞,P表示刪除分詞的個數。 其中,根據所述權重最小値計算總的編輯距離’對每 個目標字串進行排序的步驟包括: 對各個目標字串,分別確定總的編輯距離,所述總的 編輯距離爲:201131395 VI. Description of the Invention: [Technical Field] The present application relates to the field of computer data processing technology, and more particularly to a method and apparatus for sorting search results. Before the 1J technique, in the search engine, it is necessary to estimate the matching degree between the search result and the query string according to the position distance of several words of the query string in the search result (target string), and the distance is generally more A high degree of matching results in a higher ranking. For example, the query string is a "disinfector", and the search result containing the "disinfector" is usually closer to the user's intention than the "disinfecting industrial washing machine", which is closer to the user than the "disinfecting device, dehydrator, dryer" Intent, which will affect the ranking of the search results. The traditional implementation of calculating the distance of multiple words of the query string in the target string is the minimum sliding window, that is, looking for the target string. The interval with the smallest possible length, the word and the word containing the query string in the interval 'use the length of the interval to describe the distance of the query word in the target string. For example, the query string is "I look at the scenery" The target string is "I am squatting on the bridge and watching the scenery. I am watching the scenery. I am watching I I under the bridge." (The vertical line represents the result of the word segmentation) The minimum sliding window is " I I look at the scenery on I Bridge I. The length is 6 words. Another way to calculate the length of a word is to edit the distance. Unlike the smallest sliding window, it is not the length of the word for the single-string. And And calculate the length of the portion of the difference between the two strings. For example, "I and you '201,131,395 and" big and small "part of a total difference in two words (the first and third words), edit distance of 2. Currently, 'the degree of matching between the query string and the target string is usually determined according to the length or distance'. That is, if the minimum sliding window length or editing distance is smaller, the matching degree is higher, and vice versa. However, in some cases, a simple length or distance does not accurately reflect the degree of matching. For example, the query string is "Nokia battery", the search result A is "Nokia battery" ‘B is “Nokia mobile phone, free battery”, c is “Nokia n73 mobile phone original battery”. According to the simple distance calculation, the distance between A's "Nokia" and "Battery" is 0, and the matching degree is the best; the distance between "Nokia" and "Battery" of B and C is 3 words, matching degree Not good enough. But in fact, C's "n73 mobile phone" is a strong word related to "Nokia". "Original" is also a word strongly related to "battery". Although there are 3 words in the middle, the matching degree of C is higher than B. a lot of. Considering the different influences of different words on the distance calculation, the predecessors have some research, for example, the word weight can be set according to part of speech (POS). However, this method of setting weights according to part of speech is still too simple. It does not involve an essential question, that is, whether the query string and the target string semantics are related, and thus the obtained length or distance cannot accurately reflect the query string and the target word. The degree of matching of the strings, that is, the target string associated with the query string semantics is not guaranteed to be ranked first. SUMMARY OF THE INVENTION The present application provides a method and apparatus for sorting search results -6 - 201131395. By querying the semantic relevance of a string and a target string, the target string can be more accurately sorted, reflecting each The degree to which the target string matches the query string. The present application provides a method for sorting search results, including: the server pre-calculates the semantic association weight between each rained word in the statistical sample, obtains and saves the word weight table, and the method further includes: receiving by the server The query string input by the user terminal searches for the target string according to the query string; the server separately separates the query string and the target string, and sequentially segments the word segment of the query string with the target string. Combining word segmentation two-two; querying a word weight table, obtaining a weight 每个 of each piece merging; and obtaining a weighted word length according to the weight ,, sorting each target string according to the weighted word length, and Feedback to the user terminal 0, wherein the server pre-calculates the semantic association weight between each two words in the statistical sample, and the step of obtaining the word weight table comprises: the server acquiring the statistical sample; selecting the first from the statistical sample a word and a second word, counting the number c of occurrences of the first word and the second word co-occurring in the statistical sample (first Word, second word); count the number of occurrences of the second word in the statistical sample ZC ( Yi, second word) 'where 'the Yi represents each word that appears together with the second word; calculate the first word The rate of p (201131395 first word 丨 second word) = C (first word, second word) / EC ( Yi, second word.) in the presence of the second word; Taking the semantic correlation weight of the first word and the second word as W=lP, wherein the W is a weight, and the P is a rate of the first word under the condition of occurrence of the second word; and repeating the above steps, sequentially obtaining The semantic weights of each word in the statistical sample relative to other words are obtained into the word weight table. Wherein, the source of the statistical sample includes any form of text or symbol, the text including webpage text, a user search log, and a user click. The weighted word length is a minimum sliding window weighting length; and the step of obtaining the weighted word length according to the weight 对 to sort each target string includes: respectively taking each word segment of the target string in each part of the query string The weight of each word segment of the query string is the smallest at each target word string; for each target word string, the minimum sliding window weighted length is calculated according to the minimum weight ;; and each target is compared The minimum sliding window weighting length of the string, the length is small, the order is first, and vice versa, the sorting is later. The calculation of the minimum sliding window weighting length of each target string specifically includes: a minimum sliding window weighting length έπ = , i=kl=k J=] wherein W represents a weight, and Ti represents an ith of the target string. Word segment -8 - 201131395, k, h respectively represent the start position and end position of the minimum sliding window of the target string, Qj represents the jth participle in the query string, and m represents the number of query word segmentation. The application also provides a method for sorting search results, the server pre-calculates the semantic association weight between each two words in the statistical sample, and obtains and saves the word weight table. The method further includes: the server receives the user a query string input by the terminal, searching according to the query string and obtaining a target string; the server separately segmenting the query string and the target string; the server calculates the inserted words according to the stored word weight table The weight of each word segment of the query string is the smallest; the server calculates the minimum weight of each word segmentation of the deleted word relative to the target word string according to the stored word weight table; and calculates the total editing distance according to the minimum weight ' The total edit distance sorts each target string and feeds back to the user terminal, wherein according to the word weight table, the step of calculating the minimum weight of each of the inserted words relative to the query word segment includes: Obtaining the weights of the inserted words relative to the word segmentation of the query string according to the word weight table'; and calculating the insertion The weight of each word in the relative query string is Σ = Σ minw^, 込) i=l /=1 7=1 where W is the weight 'It indicates the t-th part of the inserted string, η Respectively indicate the number of inserted participles 'Qj' represents the jth -9 - 201131395 participle in the query string, and m denotes the number of query word segmentation. The step of calculating the minimum weight of each of the deleted words relative to the target word string according to the word weight table includes: obtaining, according to the word weight table, the weight of each of the deleted words relative to the target word segment; The minimum weight of each word in the target word string is </-ld=\ 1=1 where W represents the weight, Ti represents the i-th part of the target string, and q represents the target word segmentation The number, Dd indicates the d-th part of the deleted word, and P indicates the number of the deleted participle. The step of sorting each target string according to the minimum weight 値 calculating the total edit distance ′′ includes: determining a total edit distance for each target string, the total edit distance being:

We =W, + WD 其中,W*®表示總的編輯距離,W[表示插入詞語相對 査詢字串各分詞的權重最小値,WD表示刪除詞語相對目 標字串各分詞的權重最小値,及 比較各目標字串的總的編輯距離,總的編輯距離小則 排序在前,反之,排序在後。 其中,在計算總的編輯距離長度之前’還包括:計算 替換詞語的編輯距離的權重最小値: 根據所述權重最小値而計算總的編輯距離’確定查詢 字串和目標字串的匹配程度的步驟包括: -10- 201131395 對各個目標字串,分別確定總的編輯距離’所述總的 編輯距離爲: W 總=W1 + W d + W c 其中,W®表示總的編輯距離,W!表示插入詞語相對 査詢字串各分詞的權重最小値,Wd表示刪除詞語相對目 標字串各分詞的權重最小値,Wc表示替換詞語相對查詢字 串和/或目標字串各分詞的權重最小値;及 比較各目標字串的總的編輯距離,總的編輯距離小則 排序在前,反之,排序在後。 其中,所述獲取替換詞語的編輯距離的權重最小値的 方式包括: 令替換詞語的編輯距離的權重最小値等於預設的固定 値,或者, 令替換詞語的編輯距離等於插入詞語相對查詢字串各 分詞的權重最小値與刪除詞語相對目標字串各分詞的權重 最小値之和,或平均値,或兩者中的最大値。 本申請案還提供了一種對搜索結果進行排序的裝置, 包括: 詞語權重表獲取模組,用以計算統計樣本中每兩個詞 語之間的語義關聯權重,獲得並保存詞語權重表; 詞獲取模組,用以接收用戶終端輸入的查詢字串,根 據査詢字串而進行搜索並獲得目標字串; 分詞模組,用以在伺服器獲得查詢字串和目標字串後 ,對所述查詢字串和目標字串分別進行分詞; -11 - 201131395 組合模組,用以將査詢字串的各分詞依次與目標字串 的分詞兩兩組合; 查詢模組,用以査詢所述詞語權重表,獲得每個分片 語合的權重値;及 匹配模組,用以根據所述權重値而獲得加權詞語長度 ,對每個目標字串進行排序,並反饋給用戶終端》 其中,所述詞語權重表獲取模組包括: 樣本獲取模組,用以獲取統計樣本; 第一統計模組,用以從所述統計樣本中選取第一詞語 和第二詞語,統計所述第一詞語和第二詞語在統計樣本中 共同出現的次數C (第一詞語,第二詞語) 第二統計模組,用以統計第二詞語在統計樣本中出現 的次數EC(Yi’第二詞語),其中,所述Yi代表每個跟第 二詞語共同出現的詞語: 槪率計算模組,用以計算所述第一詞語在第二詞語出 現條件下的槪率P (第一詞語丨第二詞語)=C (第一詞語, 第二詞語)/[C ( Yi,第二詞語) 權重計算模組’用以在查詢第二詞語時,取第一詞語 與第二詞語的語義相關權重爲W=l-P,其中,所述W爲權 重,所述P爲第一詞語在第二詞語出現條件下的槪率;及 產生模組,用以獲得所述統計樣本中每個詞語相對其 他詞5吾的語義相關權重後’產生詞語權重表。 其中,當所述加權詞語長度爲最小滑動視窗加權長度 時,所述匹配模組包括: -12- 201131395 權重最小値獲取模組,用以分別取目標字串的各個分 詞在查詢字串各分詞的權重最小値;或者,分別取査詢字 串的各個分詞在目標字串各分詞的權重最小値; 第一計算模組,用以對各個目標字串,根據所述權重 最小値分別計算最小滑動窗口加權長度;及 排序模組,用以比較各目標字串的最小滑動視窗加權 長度,長度小則排序在前,反之,排序在後。 本申請案還提供了 一種對搜索結果進行排序的裝置, 包括’· 詞語權重表獲取模組,用以計算統計樣本中每兩個詞 語之間的語義關聯權重,以獲得並保存詞語權重表; 詞獲取模組,用以接收用戶終端輸入的查詢字串,根 據查詢字串而進行搜索並獲得目標字串; 分詞模組,用以在伺服器獲得查詢字串和目標字串後 ,對所述查詢字串和目標字串分別進行分詞; 第一權重最小値計算模組,用以計算插入的詞語相對 查詢字串各分詞的權重最小値; 第二權重最小値計算模組,用以計算刪除的詞語相對 目標字串各分詞的權重最小値;及 匹配模組’用以根據所述權重最小値而5十算總的編輯 距離,對每個目標字串進行排序’並反饋給用戶終端。 其中,所述匹配模組包括:We =W, + WD where W*® indicates the total edit distance, W[ indicates that the weight of each word segmentation of the inserted word relative to the query string is the smallest, and WD indicates that the weight of each word segmentation of the deleted word relative to the target string is the smallest, and comparison The total edit distance of each target string. If the total edit distance is small, the sort is first. Otherwise, the sort is after. Wherein, before calculating the total edit distance length, the method further includes: calculating a minimum weight of the edit distance of the replacement word: calculating a total edit distance according to the minimum weight ' 'determining the matching degree between the query string and the target string The steps include: -10- 201131395 For each target string, determine the total edit distance respectively. The total edit distance is: W total = W1 + W d + W c where W® represents the total edit distance, W! Indicates that the weight of each word segmentation of the inserted word relative to the query string is the smallest, Wd indicates that the weight of each word segmentation of the deleted word relative to the target word string is the smallest, and Wc indicates that the weight of the replacement word relative to the query word string and/or the word segment of the target string is the smallest; And compare the total edit distance of each target string, the total edit distance is small, the sort is first, and vice versa, the sort is after. The manner of obtaining the minimum weight of the edit distance of the replacement word includes: making the weight of the edit distance of the replacement word minimum 値 equal to the preset fixed 値, or making the edit distance of the replacement word equal to the inserted word relative query string The weight of each participle is the smallest and the smallest of the weights of the word segmentation relative to the target word, or the average 値, or the largest 两者 of the two. The application further provides an apparatus for sorting search results, comprising: a word weight table obtaining module, configured to calculate a semantic association weight between each two words in a statistical sample, obtain and save a word weight table; The module is configured to receive a query string input by the user terminal, perform a search according to the query string and obtain a target string; and a word segmentation module, configured to: after the server obtains the query string and the target string, the query The word string and the target string are respectively segmented; -11 - 201131395 The combination module is used to combine the word segments of the query string with the word segmentation of the target string in turn; the query module is used to query the word weight table Obtaining a weight of each fragmentation; and a matching module for obtaining a weighted word length according to the weight, sorting each target string, and feeding back to the user terminal, wherein the words The weight table acquisition module includes: a sample acquisition module for acquiring a statistical sample; a first statistical module, configured to select the first word and the second word from the statistical sample And counting the number C (first word, second word) of the first word and the second word co-occurring in the statistical sample, the second statistical module, for counting the number of occurrences of the second word in the statistical sample EC ( Yi' second word), wherein the Yi represents each word that appears together with the second word: a rate calculation module for calculating a probability P of the first word under the second word occurrence condition ( The first word 丨 second word) = C (first word, second word) / [C ( Yi, second word) weight calculation module 'used to take the first word and the second when querying the second word The semantic relevance weight of the word is W=lP, wherein the W is a weight, the P is a rate of the first word under the condition of occurrence of the second word; and a generating module is used to obtain each of the statistical samples The words are generated relative to the other words 5 my semantic relevance weights. Wherein, when the length of the weighted word is the minimum sliding window weighting length, the matching module comprises: -12- 201131395 weight minimum 値 acquisition module, which is used to respectively take each word segment of the target string in each part of the query string The weight of each participle of the query string is the smallest in the weight of each word segment of the target string; the first computing module is configured to calculate the minimum slip according to the minimum weight of each target string. The window weighting length; and the sorting module is used to compare the minimum sliding window weighting length of each target string, and the length is small, the sorting is first, and vice versa, the sorting is followed. The application also provides an apparatus for sorting search results, including a 'word weight table acquisition module, for calculating a semantic association weight between each two words in a statistical sample, to obtain and save a word weight table; a word acquisition module, configured to receive a query string input by the user terminal, perform a search according to the query string and obtain a target string; a word segmentation module, after the server obtains the query string and the target string, The query string and the target string are respectively segmented; the first weight minimum 値 calculation module is used to calculate the minimum weight of the inserted words relative to the query word segment; the second weight minimum 値 calculation module is used to calculate The deleted words have the smallest weight relative to each word segment of the target string; and the matching module 'is used to sort each target string according to the minimum weight of the weight and calculate the total editing distance of 5' and feed back to the user terminal. . The matching module includes:

第一總編輯距離計算模組,用以對各個目標字串’分 別確定總的編輯距離,所述總的編輯距離爲_· WiS = W l +W D -13- 201131395 其中,Wm表示總的編輯距離,W,表示插入詞語相對 查詢字串各分詞的權重最小値,WD表示刪除詞語相對目 標字串各分詞的權重最小値;及 排序模組,用以比較各目標字串的總的編輯距離’總 的編輯距離小則排序在前,反之,排序在後》 其中,所述裝置還包括: 第三權重最小値計算模組,用以在計算總的編輯距離 長度之前,獲取替換詞語的編輯距離的權重最小値; 所述匹配模組包括: 第二總編輯距離計算模組,用以對各個目標字串’分 別確定總的編輯距離,所述總的編輯距離爲:W*8 zWf + Wo + wc 其中,w®表示總的編輯距離,W,表示插入詞語相對 査詢字串各分詞的權重最小値,wD表示刪除詞語相對目 標字串各分詞的權重最小値,Wc表示替換詞語相對查詢字 串和/或目標字串各分詞的權重最小値;及 排序模組,用以比較各目標字串的總的編輯距離,總 的編輯距離小則排序在前,反之,排序在後。 應用本申請案,相對於習知的簡單的詞語長度或距離 的計算沒有考慮目標字串中的詞語跟査詢詞語的語義關聯 程度,本申請案透過引入表示查詢字串和目標字串的語義 關聯度的詞語權重,更準確地對目標字串進行排序,將與 査詢字串語義相關的目標字串排在前面,反映出了各目標 字串與查詢字串的匹配程度。在實際應用中應用簡單,且 -14- 201131395 效果好。 【實施方式】 下面將結合本申請案之實施例中的附圖,對本申請案 之實施例中的技術方案進行清楚、完整地描述,顯然,所 描述的實施例僅僅是本申請案的一部分實施例,而不是全 部的實施例。基於本申請案中的實施例,本領域普通技術 人員在沒有作出創造性勞動前提下所獲得的所有其他實施 例,都屬於本申請案之保護的範圍。 本申請案在計算詞語距離或詞語長度中加入了語義因 素,考慮了查詢字串和目標字串之間的語義關聯,更佳地 衡量了查詢字串和目標字串的匹配程度,使得搜索引擎中 的搜索結果可以得到更合理的排名。當然,本申請案可以 應用在任何計算字串匹配度的地方,並不局限於搜索引擎 〇 由於本申請案考慮的字串之間的語義,因而需要每兩 個詞語之間的語義關聯權重,下面首先說明如何獲得每兩 個詞語之間的語義關聯權重,以獲得詞語權重表,參見圖 1,具體包括如下步驟: 步驟1 0 1,伺服器獲取統計樣本:該統計樣本的來源 包括任何形式的文本或符號,其中,所述文本包括網頁文 本、用戶搜索日誌、用戶點擊日誌等。 通常來說,如果統計樣本中第一詞語和第二詞語共同 出現的次數越多,說明第一詞語和第二詞語越相關。例如 -15- 201131395 ,在文本中“諾基亞”和“手機”經常共同出現,或者用 戶經常搜索“諾基亞”然後點擊了帶有"手機”的結果, 都能在某種程度表示“諾基亞”和“手機”高度相關,因 而如果用戶搜索“諾基亞”時,結果中含有“手機”對我 們來說不是個意外。 步驟1 02,從統計樣本中選取第一詞語和第二詞語, 統計所述第一詞語和第二詞語在統計樣本中共同出現的次 數C (第一詞語,第二詞語); 例如,統計“手機”和“諾基亞”的共現次數C (手 機,諾基亞)’並且於是可以得出,最後輸出所有詞語( 在搜索每個詞語時)的權重。 步驟1 03 ’統計第二詞語在統計樣本中出現的次數 (Yi ’第二詞語),其中’所述Yi代表每個跟第二詞語共 同出現的詞語; 例如’統計“諾基亞”和其他詞語共現的總次數即“ 諾基亞”的出現總次數)Σ C ( Y i,諾基亞),其中γ i代 表每個跟"諾基亞”共現的詞語。 步驟1 04 ’計算第一詞語在第二詞語出現條件下的槪 率P (第一詞語丨第二詞語)=C (第一詞語,第二詞語) /EC ( Yi,第二詞語): 例如,可以得到“手機”在“諾基亞”出現條件下的 槪率P(手機丨諾基亞)=C (手機,諾基亞)/£C(Yi,諾 基亞)。 步驟1 〇5,當查詢第二詞語時,取第一詞語與第二詞 -16- 201131395 語的語義相關權重爲w=i-p ;其中,W爲權重,P爲第一詞 語在第二詞語出現條件下的槪率。 例如,取W=l-P作爲查詢"諾基亞”時,“手機”和 “諾基亞”的語義相關權重。 本例中權重採用的是1減去第一詞語在第二詞語出現 下的條件槪率,在其他實施例中也可以採用其他方式表示 權重,如直接用P作爲權重等等。 步驟1 06,判斷統計樣本中是否所有詞語都處理完畢 ,是則執行步驟1 〇7,否則重複上述步驟,依次獲得所述 統計樣本中每個詞語相對其他詞語的語義相關權重, 步驟1 07,輸出包含統計樣本中每個詞語相對其他詞 語的語義相關權重,以獲得到詞語權重表。 例如,詞語權重表的其中一種可能的形式可以如表1 所示: 表1 詞語1 詞語2 權重値 第一詞語 第二詞語 W12 第一詞語 第三詞語 W13 第二詞語 第三詞語 W23 第m詞語 第η詞語 Wmn 需要說明的是’表1所示詞語權重表僅僅是一具體實 施例’在實際應用中詞語權重表還可以有其他的表現形式 ’這裏,並不對詞語權重表的表現形式進行限定。 -17- 201131395 至此,獲得了詞語權重表,亦即獲得了在查詢第二詞 語時第一詞語的權重。 需要說明的是,詞語權重的獲取可以使用任何方式, 圖1所示僅爲透過統計語言模型而獲得到的統計槪率一具 體實施例而已,在實際應用中還可以採用其他方式獲取, 如任何自動計算或人工設定的方式,在此,並不對獲取詞 語權重表的方式進行限定。 圖2是根據本申請案實施例的一種對搜索結果進行排 序的方法流程圖,具體包括以下步驟: 步驟201,伺服器獲得查詢字串和目標字串。 其中,査詢字串通常是用戶輸入的,目標字串通常是 伺服器經檢索後得到的與查詢字串相關的字串,例如,查 詢字串亦即用戶輸入的是“諾基亞電池”,伺服器檢索後 獲得到的目標字串是A “諾基亞電池” ,B “諾基亞手機, 贈送電池”,C “諾基亞n73手機原裝電池”,則上述透過 檢索而獲得到的A、B、C都是目標字串。本申請案實施例 的目的就是判斷各目標字串(如檢索結構A、B、C)與查 詢字串的匹配程度。也就是說,伺服器接收用戶終端輸入 的查詢字串,根據查詢字串而進行搜索並獲得目標字串。 在本實施例中,以査詢字串爲“諾基亞電池",目標 字串爲C "諾基亞n73手機原裝電池”爲例進行說明。對於 目標字串A “諾基亞電池”和B “諾基亞手機,贈送電池” 與目標字串C “諾基亞n73手機原裝電池”的處理過程基本 相同,不再詳述。 -18- 201131395 步驟202,伺服器對所述查詢字串和目標字串分別進 行分詞,獲得到構成查詢字串的分詞和構成目標字串的分 詞。 這裏,令查詢字串爲Q,目標字串爲T,對查詢字.串分 詞後可得到QlQ2...Qm,對目標字串分詞後可得到 T1T2…Τη。在本實施例中, 對查詢字串分詞以後得到:Q1Q2 =諾基亞|電池, 對目標字串分詞後得到T1T2T3T4T5 =諾基亞|n73丨手機 丨原裝丨電池。 本申請案中的分詞可以是對字串任意方法的切分,可 以分成語言意義上的詞,也可以是分成單字或字母、符號 等等。 步驟203,將査詢字串的各分詞依次與目標字串的分 詞兩兩組合,獲得到多個由一個查詢字串分詞和一個目標 字串分詞所構成的分片語合; 具體上,獲得到(Ti,Ql) 、( Ti * Q2 ) ... ( Ti > Qm )° 本實施例中得到的分片語合包括:(T1,Q1 )、( T1,Q2) 、 (T2,Q1) 、 (T2,Q2) 、 (T3,Q1)、( T3,Q2) 、 (T4,Qi) 、 (T4,Q2) 、 ( T5 > Q1)、( T5 , Q2)。 步驟2〇4,查詢詞語權重表,獲得每個分片語合的權 重値; 這裏,令W表示權重,則根據權重表得到的每個分片 -19- 201131395 語合的權重値爲:W(T1,Q1) 、W(T1,Q2) 、W(T2 ,Q1) 、W(T2,Q2) 、W(T3,Q1) 、W(T3,Q2)、 W ( T4 > Q1) 、W(T4,Q2) 、W(T5,Q1) 、W(T5, Q2)。 令 W(T1,Q1)=W1 W(T1,Q2)=W1, W ( T2,Ql) =W2 W ( T2 > Q2 ) =W25 W ( T3 > Q 1 ) ) =W3 W ( T3 > Q2 ) =W3 5 W ( T4 > Q1 ) =W4 W ( T4,Q2) =W4, W ( T5 > Q1 ) =W5 W(T5> Q2) =W55 其中,若Ti在Q中,則取Wi = 0,例如,T1爲諾基亞, Q1也爲諾基亞,則W(T1,Ql) =W1=0,同理,W(T5, Q2 ) =W5 ’=0 〇 步驟205,根據所述權重値而獲得加權詞語長度; 在本實施例中,加權詞語長度爲最小滑動視窗加權長 度,此時,步驟205具體包括以下步驟: i )分別獲取目標字串的各個分詞與查詢字串各分詞 的權重最小値:或者,分別獲取査詢字串的各個分詞與目 標字串各分詞的權重最小値;由於獲取目標字串的各個分 詞與查詢字串各分詞的權重最小値和獲取查詢字串的各個 分詞與目標字串各分詞的權重最小値的處理過程非常相似 ’下面僅以獲取目標字串的各個分詞與查詢字串各分詞的 權重最小値爲例進行說明。 具體到上述實施例,亦即需要獲取T1相對Q1和Q2的 兩個權重中的最小値,T2相對Q1和Q2的兩個權重中的最 -20- 201131395 小値....... 這裏,假設W(T1,Q1)和W(T1,Q2)的權重最小 値爲Wl,W(T2,Q1)和W(T2,Q2)的權重最小値爲 W2,W(T3,Q1)和W(T3,Q2)的權重最小値爲W3’ W ( T4,Q1 )和W ( T4,Q2 )的權重最小値爲W4,W ( T5 ,Q1 )和W ( T5,Q2 )的權重最小値爲 W5’。 ii )對各個目標字串,根據所述權重最小値而分別計 算最小滑動窗口加權長度; 確定每個目標字串的最小滑動視窗加權長度具體包括 最小滑動窗口加權長度^二咐总), /=* ί=* 7=1 其中,W表示權重,Ti表示目標字串中的第i個的分詞 ,k、h分別表示目標字串最小滑動視窗的起始位置和結束 位置,Qj表示查詢字串中的第j個分詞,m表示查詢字串分 詞的個數。 對於上述實施例,最小滑動窗口加權長度[Wi = Wl + W2 + W3 + W4 + W5 5 重複上述步驟202至205,可以得到查詢字串相對各個 目標字串的最小滑動視窗加權長度。 步驟206,根據所述加權詞語長度而確定查詢字串和 目標字串的匹配程度,亦即根據所述加權詞語長度對每個 目標字串進行排序,並反饋給用戶終端。 具體上,比較各目標字串的最小滑動視窗加權長度, 所述長度越小則匹配程度越高,反之,匹配程度越低,也 -21 - 201131395 即長度越小則排序越靠前,反之,排序越靠後。 至此,確定了查詢字串與各目標字串之間的匹配程度 。傳統的簡單的詞語長度的計算沒有考慮目標字串中的詞 語跟查詢詞語的語義關聯程度,因而得到的詞語長度不能 準確地反映查詢和目標的匹配程度。如“諾基亞電池”和“ 諾基亞n73手機原裝電池”,雖然長度差異很大,但是如果 査詢詞語是“諾基亞電池”的情況下,兩者沒有很大實質區 別。本申請案透過引入表示査詢字串和目標字串的語義關 聯度的詞語權重,更準確地對目標字串進行排序,將與查 詢字串語義相關的目標字串排在前面,反映出了各目標字 串與查詢字串的匹配程度。在實際應用中應用簡單,且效 果好。 圖3是根據本申請案實施例的另一種對搜索結果進行 排序的方法流程圖,本實施例基於編輯距離計算査詢字串 和目標字串之間的差異,其中,編輯距離是指從一個字串 變化到另一個字串最少需要的基本操作次數,或理解爲兩 個字串差異部分的長度之和。通常的基本操作包括插入~ 個字/詞,刪除一個字/詞,替換一個字/詞,或者其他根據 需要而設的操作。例如從"我愛你”變化到“我不愛她” 至少需要插入一個“不”、將“你”替換成“她”兩次基 本操作,因此兩者的編輯距離爲2,同理,"隱形的翅膀 ”和“好吃的雞翅膀”編輯距離爲3。圖3所示流程具體上 包括以下步驟’· 步驟3 0 1,伺服器獲得查詢字串和目標字串。 -22- 201131395 其中,査詢字串通常是用戶輸入的,目標字串通常是 伺服器經檢索後得到的與查詢字串相關的字串。例如,査 詢字串是“諾基亞手機電池”,目標字串是“原裝諾基亞 手機電池”和“諾基亞手機,贈送電池”。也就是說,伺 服器接收用戶終端輸入的查詢字串,根據査詢字串進行搜 索並獲得目標字串。 本申請案實施例的目的就是判斷各目標字串與查詢字 串的匹配程度。 在本實施例中,以查詢字串爲“諾基亞手機電池”, 目標字串爲“原裝諾基亞手機電池”爲例進行說明。對於 目標字串“諾基亞手機,贈送電池”,由於其與目標字串 “原裝諾基亞手機電池”的處理過程基本相同,不再詳述 0 步驟3 02,伺服器對所述查詢字串和目標字串分別進 行分詞,得到構成査詢字串的分詞和構成目標字串的分詞 〇 這裏,令查詢字串爲Q,目標字串爲T,對查詢字串分 詞後可得到QlQ2...Qm,對目標字串分詞後可得到 T1T2…Τη °在本實施例中, 對查詢字串分詞以後得到:Q1Q2Q3 =諾基亞|手機丨電 池, 對目標字串分詞後得到T1T2T3 =原裝|諾基亞|電池。 本申請案中的分詞可以是對字串任意方法的切分,可 以分成語言意義上的詞,也可以是分成單字或字母、符號 -23- 201131395The first chief editor distance calculation module is configured to determine a total edit distance for each target string respectively, the total edit distance is _· WiS = W l + WD -13 - 201131395 where Wm represents the total edit The distance, W, indicates that the insertion word has the smallest weight relative to each word segment of the query string, WD indicates that the weight of the deleted word relative to the target word string has the smallest weight; and the sorting module is used to compare the total editing distance of each target string. 'The total editing distance is small, the sorting is first, and vice versa, the sorting is later. The apparatus further includes: a third weight minimum 値 computing module for obtaining the editing of the replacement words before calculating the total editing distance length The weight of the distance is the smallest; the matching module includes: a second total edit distance calculation module for determining a total edit distance for each target string respectively, the total edit distance is: W*8 zWf + Wo + wc where w® represents the total edit distance, W, which represents the minimum weight of each word segmentation of the inserted word relative to the query string, and wD represents the minimum weight of each word segmentation of the deleted word relative to the target string, c represents the minimum weight of the replacement word relative to the query string and/or the target word segment; and the sorting module is used to compare the total editing distance of each target string, and the total editing distance is small, and the order is first; , sorted after. Applying the present application, the semantic association of the words in the target string with the query term is not considered in relation to the conventional simple term length or distance calculation, and the present application introduces a semantic association representing the query string and the target string. The weight of the words is more accurately sorted by the target string, and the target string related to the semantics of the query string is ranked in front, reflecting the degree of matching between each target string and the query string. It is easy to apply in practical applications, and -14- 201131395 works well. The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the implementation of the present application. For example, not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are within the scope of the present invention. In this application, a semantic factor is added in calculating the word distance or the length of the word, considering the semantic association between the query string and the target string, and the degree of matching between the query string and the target string is better measured, so that the search engine Search results in can get a more reasonable ranking. Of course, the present application can be applied to any place where the string matching degree is calculated, and is not limited to the semantics of the search engine, because of the semantics between the strings considered in the present application, and thus the semantic association weight between each two words is required. First, how to obtain the semantic association weight between each two words to obtain the word weight table, as shown in Figure 1, specifically includes the following steps: Step 1 0, the server obtains a statistical sample: the source of the statistical sample includes any form Text or symbol, wherein the text includes webpage text, user search logs, user click logs, and the like. In general, the more the first word and the second word appear together in the statistical sample, the more relevant the first word and the second word are. For example, -15-201131395, in the text "Nokia" and "mobile phone" often appear together, or users often search for "Nokia" and then click on the results with "mobile phone, can all mean "Nokia" and to some extent “Mobile” is highly relevant, so if the user searches for “Nokia”, the result is “not a surprise” for us. Step 1 02, select the first word and the second word from the statistical sample, count the number The number C (first word, second word) in which a word and a second word appear together in a statistical sample; for example, the number of co-occurrences of "mobile phone" and "Nokia" C (mobile phone, Nokia)' and then available Out, finally the weight of all words (when searching for each word). Step 1 03 'Statistics the number of times the second word appears in the statistical sample (Yi 'second word), where 'Yi stands for each Words in which two words co-occur; for example, the total number of times that 'statistics Nokia' co-occurred with other words is the total number of occurrences of "Nokia") Σ C (Y i Nokia), where γ i represents each with " Nokia term co-occurrence. " Step 1 04 'Calculate the probability P of the first word under the condition of occurrence of the second word (first word 丨 second word) = C (first word, second word) /EC ( Yi, second word): for example You can get the "Mobile" in the "Nokia" situation under the rate P (mobile phone 丨 Nokia) = C (mobile phone, Nokia) / £ C (Yi, Nokia). Step 1 〇5, when querying the second word, take the semantic relevance weight of the first word and the second word-16-201131395 as w=ip; where W is the weight and P is the first word appearing in the second word The rate of exchange under conditions. For example, take W=lP as the semantic relevance weight of “mobile phone” and “Nokia” when querying “Kyoki.” In this example, the weight is 1 minus the conditional rate of the first word in the presence of the second word. In other embodiments, the weight may be expressed in other manners, such as directly using P as the weight, etc. Step 1 06, determining whether all the words in the statistical sample are processed, and then performing step 1 〇 7, otherwise repeating the above steps, Semantic correlation weights of each word in the statistical sample relative to other words are sequentially obtained, and in step 107, the semantic relevance weights of each word in the statistical sample relative to other words are output to obtain a word weight table. For example, the word weight table One of the possible forms can be as shown in Table 1: Table 1 Word 1 Word 2 Weight 値 First Word Second Word W12 First Word Third Word W13 Second Word Third Word W23 Word m Word η Word Wmn Need It is illustrated that the term weight table shown in Table 1 is merely a specific embodiment. In practical applications, the word weight table may have other The form of expression 'here, does not limit the expression of the word weight table. -17- 201131395 So far, the word weight table has been obtained, that is, the weight of the first word when querying the second word is obtained. The weight can be obtained in any way. Figure 1 shows only the statistical example obtained by the statistical language model. In the actual application, other methods can be used, such as any automatic calculation or manual setting. In this manner, the manner in which the word weight table is obtained is not limited. FIG. 2 is a flowchart of a method for sorting search results according to an embodiment of the present application, which specifically includes the following steps: Step 201: The server obtains a query word String and target string. The query string is usually input by the user. The target string is usually the string related to the query string obtained by the server after searching. For example, the query string is entered by the user. Nokia battery", the target string obtained after server retrieval is A "Nokia battery", B "Nokia ", the battery is given," "Nokia n73 mobile phone original battery", then A, B, C obtained through the search are all target strings. The purpose of the embodiment of the present application is to determine the target string (such as retrieval The degree of matching between the structure A, B, and C) and the query string. That is, the server receives the query string input by the user terminal, searches according to the query string, and obtains the target string. In this embodiment, The query string is "Nokia battery", the target string is C " Nokia n73 mobile phone original battery" as an example. For the target string A "Nokia battery" and B "Nokia mobile phone, free battery" and target string C "Nokia n73 mobile phone original battery" processing process is basically the same, no longer detailed. -18- 201131395 Step 202: The server separately segmentes the query string and the target string to obtain a word segment constituting the query string and a word segment constituting the target string. Here, the query string is Q, the target string is T, and QlQ2...Qm can be obtained for the query word. After the word segmentation, T1T2...Τn can be obtained after the target word segmentation. In this embodiment, after the word segmentation of the query string is obtained: Q1Q2 = Nokia | battery, after the target word segmentation, T1T2T3T4T5 = Nokia | n73 丨 mobile phone 丨 original 丨 battery. The word segmentation in the present application may be a segmentation of any method of the string, may be divided into words in the linguistic sense, or may be divided into single words or letters, symbols, and the like. Step 203: Combine each word segment of the query string with the word segmentation of the target string in sequence, and obtain a plurality of segmentation linguistics composed of one query string segmentation and one target string segmentation; specifically, obtain (Ti, Ql), (Ti * Q2 ) ... ( Ti > Qm ) ° The fragmentation linguistics obtained in this embodiment include: (T1, Q1), (T1, Q2), (T2, Q1) , (T2, Q2), (T3, Q1), (T3, Q2), (T4, Qi), (T4, Q2), (T5 > Q1), (T5, Q2). Step 2〇4, query the word weight table, and obtain the weight of each piece segmentation; here, let W denote the weight, then the weight of each piece -19-201131395 according to the weight table is: W (T1, Q1), W(T1, Q2), W(T2, Q1), W(T2, Q2), W(T3, Q1), W(T3, Q2), W (T4 > Q1), W (T4, Q2), W(T5, Q1), W(T5, Q2). Let W(T1,Q1)=W1 W(T1,Q2)=W1, W ( T2,Ql) =W2 W ( T2 > Q2 ) =W25 W ( T3 > Q 1 ) ) =W3 W ( T3 &gt ; Q2 ) = W3 5 W ( T4 > Q1 ) = W4 W ( T4 , Q2 ) = W4 , W ( T5 > Q1 ) = W5 W (T5 > Q2) = W55 where, if Ti is in Q, then Take Wi = 0, for example, T1 is Nokia, Q1 is also Nokia, then W(T1, Ql) = W1=0, similarly, W(T5, Q2) = W5 '=0 〇 Step 205, according to the weight In this embodiment, the length of the weighted word is the minimum sliding window weighting length. In this case, step 205 specifically includes the following steps: i) respectively acquiring each participle of the target string and each participle of the query string The weight is the smallest: or, the weight of each word segment and the target word segment of the query string is the smallest; the weight of each word segment and the query word segment of the target string is the smallest, and each of the query string is obtained. The processing of the minimum weight of each participle of the word segmentation and the target word string is very similar. 'The following is only the case where the weight of each word segment of the target word string and the word segment of the query string is the smallest. Described. Specifically, to the above embodiment, it is necessary to obtain the minimum 値 of the two weights of T1 with respect to Q1 and Q2, and T2 is the most -20-201131395 of the two weights of Q1 and Q2. Suppose W(T1, Q1) and W(T1, Q2) have the minimum weight 値 W1, W(T2, Q1) and W(T2, Q2) have the minimum weight 値 W2, W(T3, Q1) and W (T3, Q2) The minimum weight is W3' W (T4, Q1) and W (T4, Q2) have the minimum weight 値 W4, W (T5, Q1) and W (T5, Q2) have the minimum weight 値W5'. Ii) for each target string, respectively calculating a minimum sliding window weighting length according to the minimum weight; determining a minimum sliding window weighting length of each target string specifically including a minimum sliding window weighting length ^2咐 total), /= * ί=* 7=1 where W is the weight, Ti is the i-th part of the target string, k and h are the start and end positions of the minimum sliding window of the target string, and Qj is the query string. The jth participle in m, m represents the number of query word segmentation. For the above embodiment, the minimum sliding window weighting length [Wi = Wl + W2 + W3 + W4 + W5 5 Repeating the above steps 202 to 205, the minimum sliding window weighting length of the query string with respect to each target string can be obtained. Step 206: Determine a matching degree of the query string and the target string according to the weighted word length, that is, sort each target string according to the weighted word length, and feed back to the user terminal. Specifically, comparing the minimum sliding window weighting length of each target string, the smaller the length, the higher the matching degree, and vice versa, the lower the matching degree, and the shorter the length, the higher the ranking, and vice versa. The lower the sort. So far, the degree of matching between the query string and each target string is determined. The traditional simple term length calculation does not consider the semantic relevance of the words in the target string to the query term, so the resulting word length does not accurately reflect the degree of matching between the query and the target. Such as "Nokia battery" and "Nokia n73 mobile phone original battery", although the length varies greatly, but if the query word is "Nokia battery", the two are not very different. By introducing word weights indicating the semantic relevance of the query string and the target string, the present application more accurately sorts the target string, and ranks the target string related to the query string semantics in front, reflecting each The degree to which the target string matches the query string. It is simple to apply and effective in practical applications. FIG. 3 is a flowchart of another method for sorting search results according to an embodiment of the present application. The present embodiment calculates a difference between a query string and a target string based on an edit distance, wherein the edit distance refers to a word from one word. The minimum number of basic operations required to change a string to another string, or the sum of the lengths of the difference between the two strings. Common basic operations include inserting ~ words/words, deleting a word/word, replacing a word/word, or other operations as needed. For example, changing from "I love you" to "I don't love her" requires at least one basic operation of "no" and "you" to "her", so the editing distance between the two is 2, the same reason, The "invisible wings" and "good chicken wings" edit distance is 3. The flow shown in Figure 3 specifically includes the following steps '· Step 3 0 1. The server obtains the query string and the target string. -22- 201131395 Wherein, the query string is usually input by the user, and the target string is usually a string related to the query string obtained by the server after being retrieved. For example, the query string is “Nokia Mobile Phone Battery” and the target string is “Original Nokia Mobile Phone Battery” and “Nokia Mobile Phone, Free Battery”. That is, the servo receives the query string input by the user terminal, searches according to the query string, and obtains the target string. The purpose of the embodiment of the present application is to determine the degree of matching between each target string and the query string. In this embodiment, the query string is “Nokia mobile phone battery”, and the target string is “original Nokia mobile phone battery” as an example for description. For the target string "Nokia mobile phone, free battery", since it is basically the same as the target string "original Nokia mobile phone battery", no more details 0 step 3 02, the server pairs the query string and the target word The string is separately segmented to obtain the word segmentation constituting the query string and the segmentation word constituting the target string. Here, the query string is Q, the target string is T, and QlQ2...Qm can be obtained after segmentation of the query string. After the target word segmentation, T1T2...Τη ° can be obtained. In this embodiment, the word segmentation is obtained after the word segmentation: Q1Q2Q3 = Nokia | mobile phone battery, after the target word segmentation, T1T2T3 = original | Nokia | battery. The word segmentation in this application may be a segmentation of any method of the string, which may be divided into words in the sense of language, or may be divided into words or letters, symbols -23-201131395

Zrv*r 券寺0 步驟3 03,伺服器根據所述詞語權重表,計算插入的 詞語相對査詢字串各分詞的權重最小値; 具體上,根據詞語權重表,獲得插入的詞語相對査詢 字串各分詞的權重値,在本例中,插入了“原裝” 一詞, 令插入的詞爲I,則可以得到插入的詞語相對査詢字串各 分詞的權重値:W(I1,Q1) 、W(I1,Q2) 、W(I1, Q3 ); 計算插入的詞語相對査詢字串各分詞的權重最小値爲 n' n m Σμ =Zminw((,2y) Μ ί=1 j=\ 其中,W表示權重,It表示插入字串中的第t個的分詞 ’ η分別表示插入分詞的個數,Qj表示査詢字串中的第j個 分詞’ m表示査詢字串分詞的個數。 步驟3 04,根據詞語權重表,計算刪除的詞語相對目 標字串各分詞的權重最小値; 具體的,根據詞語權重表,獲得刪除的詞語相對目標 字串各分詞的權重値,在本例中,刪除了“手機” 一詞, 令刪除的詞爲D,則可以得到刪除的詞語相對目標字串各 分詞的權重値:W(D1,T1) 、W(D1,T2) 、W(D1, T3 ); 計算刪除的詞語相對査詢字串各分詞的權重最小値爲 = iminw(TnDd) 和丨 ι=1 其中,W表示權重,Ti表示目標字串中的第i個的分詞 -24- 201131395 ,q表示目標字串分詞的個數’ Dd表示刪除詞語中的第d個 分詞,P表示刪除分詞的個數。 步驟3 0 5,根據所述權重最小値計算總的編輯距離’ 確定查詢字串和目標字串的匹配程度’亦即根據所述總的 編輯距離對每個目標字串進行排序,並反饋給用戶終端。 具體上,對各個目標字串’分別確定總的編輯距離’ 對於一個目標字串的總編輯距離爲: W 總=W I + W d 其中,W⑸表示總的編輯距離’ Wi表示插入詞語相對 査詢字串各分詞的權重最小値’ wd表示刪除詞語相對目 標字串各分詞的權重最小値; 比較各目標字串的總的編輯距離’所述總的編輯距離 越小則匹配程度越高,反之,匹配程度越低,也即總的編 輯距離越小則排序越靠前,反之’排序越靠後。 至此,確定了查詢字串與各目標字串的匹配程度。傳 統的簡單的詞語距離的計算沒有考慮目標字串中的詞語跟 查詢詞語的語義關聯程度,因而得到的詞語距離不能準確 地反映查詢和目標的匹配程度。本申請案透過引入表示查 詢字串和目標字串的語義關聯度的詞語權重,更準確地對 目標字串進行排序,將與查詢字串語義相關的目標字串排 在前面,反映出了各目標字串與査詢字串的匹配程度。在 實際應用中應用簡單,且效果好。 需要說明的是’對於圖3所示實施例,還存在詞語替 換的情況,例如將“我和你”變爲“我和他”時’其中的 -25- 201131395 “你”可認爲是被“他”替換,這裏,對詞語替換的情況 可以做如下處理: 方式一:將替換操作認爲是增加和刪除操作的組合, 亦即認爲替換操作是不存在的,例如,將“我和你”變爲 “我和他”時,認爲是刪除了 “你”,增加了 “他”,亦 即所有的變換都是插入和刪除操作,因而,應用圖3所示 實施例可以很好的解決。 方式二,將替換操作視爲除了插入和刪除之外的第三 種操作,例如,將“我和你”變爲“我和他”時,認爲是 將“你”替換爲“他”,此時,需要計算替換詞語的編輯 距離的權重最小値,具體可以有兩種計算方法: a) 替換詞語的編輯距離的權重最小値等於預設的固 定値,如,令替換詞語的編輯距離的權重最小値固定的等 於1 ;或者, b) 令替換詞語的編輯距離等於插入詞語相對査詢字 串各分詞的權最小重値與刪除詞語相對目標字串各分詞的 權重最小値之和,或者,令替換詞語的編輯距離等於插入 詞語相對查詢字字串各分詞的權重最小値與刪除詞語相對 目標字串各分詞的權重最小値之和的平均値,或者,令替 換詞語的編輯距離等於插入詞語相對查詢字串各分詞的權 重最小値與刪除詞語相對目標字串各分詞的權重最小値兩 種中的最大値,或其他任意組合形式。 例如,替換詞語“他”的編輯距離=插入的“他”相 對查詢字串“我和你”的各分詞的權重最小値+刪除詞語 -26- 201131395 “你”相對目標字串“我和他”各分詞的權重最小値;或 者, 替換詞語“他”的編輯距離=(插入的“他”相對查 詢字串“我和你”的各分詞的權重最小値+刪除詞語“你 ”相對目標字串“我和他”各分詞的權重最小値)/2。等 等。 在方式二的情況下,步驟3 05具體包括: 對各個目標字串,分別確定總的編輯距離,所述總的 編輯距離爲: W«g = Wi + W〇 + Wc 其中,W®表示總的編輯距離,表示插入詞語相對 查詢字串各分詞的權重最小値,wD表示刪除詞語相對目 標字串各分詞的權重最小値,Wc表示替換詞語相對查詢字 串和/或目標字串各分詞的權重最小値; 比較各目標字串的總的編輯距離,所述總的編輯距離 越小則匹配程度越高,反之,匹配程度越低,也即總的編 輯距離越小則排序越靠前,反之,排序越靠後。 需要說明的是,可以交錯地根據查詢字串和目標字串 計算權重,如圖3所示實施例中,對於插入的字串,根據 查詢字串計算權重,對於刪除的字串,根據目標字串計算 權重。 需要說明的是,對於圖2和圖3所示實施例,分詞可以 是對字串任意方法的切分,可以分成語言意義上的詞,也 可以是分成單字或字母、符號。 -27- 201131395 需要說明的是,對於圖2和圖3所示實施例,可以對權 重進行任何形式的計算或變換,比如取對數等;也可以取 目標詞語對各個查詢詞語的權重的最大値、平均値或其他 形式的運算作爲該詞的權重(加權長度)。 需要說明的是,對於圖2和圖3所示實施例,可以反過 來將目標字串作爲査詢字串,將査詢字串作爲目標字串, 不會産生本質區別。 需要說明的是,對於圖2和圖3所示實施例,詞語距離 或長度的計算區間可以是整個字串或根據演算法選定的任 意區間,如選定某字串中跟另一個字串差異的部分。 需要說明的是,匹配方法不一定要使用最小滑動窗口 或編輯距離,可以是任何關於加權詞語距離或詞語長度的 計算。 需要說明的是,本申請案並不局限應用於檢索系統如 搜索引擎,也可以應用於任何計算兩個字串匹配程度的系 統。 本申請案還揭示了 一種對搜索結果進行排序的裝置, 參見圖4,具體包括: 詞語權重表獲取模組40 1,用以計算統計樣本中每兩 個詞語之間的語義關聯權重,獲得並保存詞語權重表; 詞獲取模組402,用以接收用戶終端輸入的査詢字串 ,根據查詢字串而進行搜索並獲得目標字串; 分詞模組403,用以在伺服器獲得查詢字串和目標字 串後,對所述査詢字串和目標字串分別進行分詞; -28- 201131395 組合模組404,用以將查詢字串的各分詞依次與目標 字串的分詞兩兩組合; 查詢模組4 0 5,用以查詢所述詞語權重表,獲得每個 分片語合的權重値; 匹配模組406,用以根據所述權重値獲得加權詞語長 度,對每個目標字串進行排序,並反饋給用戶終端。 上述詞語權重表獲取模組401可以具體包括: 樣本獲取模組,用以獲取統計樣本; 第一統計模組,用以從所述統計樣本中選取第〜詞語 和第二詞語,統計所述第一詞語和第二詞語在統計樣本中 共同出現的次數C (第一詞語,第二詞語) 第二統計模組,用以統計第二詞語在統計樣本中出現 的次數( Yi ’第二詞語),其中,所述Yi代表每個跟第 二詞語共同出現的詞語; 槪率計算模組,用以計算所述第一詞語在第二詞語出 現條件下的槪率P (第一詞語丨第二詞語)=C (第一詞語, 第二詞語)/EC ( Yi,第二詞語) 權重計算模組,用以在査詢第二詞語時,取第一詞語 與第二詞語的語義相關權重爲W=l-P,其中,所述w爲權 重,所述P爲第一詞語在第二詞語出現條件下的槪率; 產生模組,用以獲得所述統計樣本中每個詞語相對其 他詞語的語義相關權重後,產生詞語權重表。 當所述加權詞語長度爲最小滑動視窗加權長度時,上 述匹配模組405可以具體包括: -29- 201131395 權重最小値獲取模組,用以分別取目標字串的各個分 詞在查詢字串各分詞的權重最小値;或者,分別取查詢字 串的各個分詞在目標字串各分詞的權重最小値; 第一計算模組,用以對各個目標字串,根據所述權重 最小値分別計算最小滑動窗口加權長度; 排序模組,用以比較各目標字串的最小滑動視窗加權 長度,長度小則排序在前,反之,排序在後,也即長度越 小時判定匹配程度越高,反之,判定匹配程度越低。 應用圖4所示實施例,透過引入表示查詢字串和目標 字串的語義關聯度的詞語權重,更準確地反映出了各目標 字串與査詢字串的匹配程度。在實際應用中應用簡單,且 效果好。 本申請案實施例還提供了 一種對搜索結果進行排序的 裝置,參見圖5,包括: 詞語權重表獲取模組501,用以計算統計樣本中每兩 個詞語之間的語義關聯權重,獲得並保存詞語權重表; 詞獲取模組5 02,用以接收用戶終端輸入的查詢字串 ,根據查詢字串進行搜索並獲得目標字串; 分詞模組5 03,用以在伺服器獲得查詢字串和目標字 串後,對所述査詢字串和目標字串分別進行分詞; 第一權重最小値計算模組5 04,用以計算插入的詞語 相對查詢字串各分詞的權重最小値; 第二權重最小値計算模組5 05,用以計算刪除的詞語 相對目標字串各分詞的權重最小値; -30- 201131395 匹配模組5 06,用以根據所述權重最小値計算總的編 輯距離,對每個目標字串進行排序,並反饋給用戶終端。 上述匹配模組506可以具體包括: 第一總編輯距離計算模組,用以對各個目標字串,分 別確定總的編輯距離,所述總的編輯距離爲:Wa 其中,We表示總的編輯距離,W!表示插入詞語相對 查詢字串各分詞的權重最小値,WD表示刪除詞語相對目 標字串各分詞的權重最小値; 排序模組,用以比較各目標字串的總的編輯距離,總 的編輯距離小則排序在前,反之,排序在後,也即總的編 輯距離越小時判定匹配程度越高,反之,判定匹配程度越 低。 圖5所述裝置還可以包括: 第三權重最小値計算模組,用以在計算總的編輯距離 長度之前,獲取替換詞語的編輯距離的權重最小値;此時 ,上述匹配模組505可以具體包括:Zrv*r voucher temple 0 step 3 03, the server calculates, according to the word weight table, the weight of each of the inserted words relative to the word segmentation of the query string is minimal; specifically, according to the word weight table, the inserted word relative query string is obtained according to the word weight table The weight of each participle is 値. In this example, the word “original” is inserted, so that the inserted word is I, then the weight of the inserted word relative to each part of the query string can be obtained: W(I1, Q1), W (I1, Q2), W(I1, Q3); Calculate the minimum weight of the inserted words relative to each part of the query string 値 is n' nm Σμ = Zminw((, 2y) Μ ί=1 j=\ where W is Weight, It means that the t-th participle 'η in the inserted string indicates the number of inserted participles, and Qj indicates the j-th participle 'm in the query string indicates the number of the query-string participle. Step 3 04, According to the word weight table, the weight of each word segmentation of the deleted word relative to the target word string is calculated to be the smallest 具体; specifically, according to the word weight table, the weight of each word segmentation of the deleted word relative to the target word string is obtained, in this example, the “deletion” is deleted. The word "mobile phone", the word deleted is D Then, the weights of the deleted words relative to the target words of the target string can be obtained: W(D1, T1), W(D1, T2), W(D1, T3); the weight of each word of the deleted word relative to the query string is calculated to be the smallest.値 = iminw(TnDd) and 丨ι=1 where W is the weight, Ti is the ith part of the target string -24-201131395, q is the number of the target word segmentation' Dd means deleting the word The dth participle, P represents the number of deleted participles. Step 3 0 5, according to the minimum weight 値 calculate the total edit distance 'determine the matching degree of the query string and the target string', that is, according to the total The edit distance sorts each target string and feeds back to the user terminal. Specifically, the total edit distance is determined for each target string separately. The total edit distance for a target string is: W total = WI + W d where W(5) indicates the total edit distance 'Wi indicates that the weight of the inserted words relative to each part of the query string is the smallest 値' wd indicates that the weight of the deleted words relative to the target word is the smallest 値; the total edit distance of each target string is compared '所The smaller the total editing distance is, the higher the matching degree is. On the contrary, the lower the matching degree, that is, the smaller the total editing distance is, the higher the ranking is. The more the sorting is, the more backward. So far, the query string and each target are determined. The degree of matching of the string. The traditional simple word distance calculation does not consider the semantic relevance of the words in the target string and the query words, so the resulting word distance cannot accurately reflect the matching degree between the query and the target. The word weights representing the semantic relevance of the query string and the target string are introduced, and the target string is sorted more accurately, and the target string related to the semantics of the query string is ranked in front, reflecting the target string and The degree to which the query string matches. In practical applications, the application is simple and the effect is good. It should be noted that, for the embodiment shown in FIG. 3, there is still a case of word substitution, for example, when "I and you" are changed to "I and him", the -25-201131395 "you" can be considered as being "He" replacement, here, the word substitution can be handled as follows: Method 1: The replacement operation is considered as a combination of addition and deletion operations, that is, the replacement operation does not exist, for example, "I and When you "become "I and him", you think that you deleted "you" and added "he", that is, all transformations are insert and delete operations. Therefore, applying the embodiment shown in Figure 3 can be very good. Solution. In the second way, the replacement operation is regarded as a third operation other than insertion and deletion. For example, when "me and you" is changed to "me and him", it is considered that "you" is replaced with "he". At this time, it is necessary to calculate the minimum weight of the edit distance of the replacement word. Specifically, there are two calculation methods: a) The minimum weight of the edit distance of the replacement word is equal to the preset fixed value, for example, the edit distance of the replacement word is The weight is minimum 値 fixed equal to 1; or, b) the editing distance of the replacement word is equal to the sum of the minimum weight of the inserted words relative to each part of the query string and the minimum weight of each word of the deleted words relative to the target string, or The edit distance of the replacement word is equal to the average weight of the minimum weight of each word segmentation of the inserted word relative to the query word string and the minimum weight of each word segment of the deleted word relative target word string, or the editing distance of the replacement word is equal to the inserted word The weight of each participle of the relative query string is the smallest, and the weight of each participle of the deleted target relative target string is the smallest, the largest of the two, or other Italian combination. For example, the edit distance of the word "he" is replaced = the value of the "he" of the inserted query string "I and you" is the smallest 値 + delete word -26- 201131395 "You" relative target string "I and him "The weight of each participle is the smallest; or, the edit distance of the word "he" is replaced = (the inserted "he" relative query string "I and you" has the least weight of each participle + delete the word "you" relative target word The weights of the words "me and him" are the smallest 値)/2. and many more. In the case of the second mode, the step 3 05 specifically includes: determining, for each target string, a total edit distance, where the total edit distance is: W«g = Wi + W〇+ Wc where W® represents total The edit distance indicates that the weight of each word segmentation of the inserted word relative to the query string is the smallest, wD indicates that the weight of each word segmentation of the deleted word relative to the target word string is the smallest, and Wc represents the word segmentation of the replacement word relative to the query word string and/or the target word string. The weight is the smallest; the total editing distance of each target string is compared, and the smaller the total editing distance is, the higher the matching degree is. On the contrary, the lower the matching degree, that is, the smaller the total editing distance is, the higher the ranking is. Conversely, the lower the ordering. It should be noted that the weights may be calculated according to the query string and the target string in an interleaved manner. In the embodiment shown in FIG. 3, for the inserted string, the weight is calculated according to the query string, and for the deleted string, according to the target word. The string calculates the weight. It should be noted that, for the embodiment shown in Fig. 2 and Fig. 3, the word segmentation may be a segmentation of any method of the string, and may be divided into words in the language sense, or may be divided into single words or letters and symbols. -27- 201131395 It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, any form of calculation or transformation of the weight may be performed, such as taking a logarithm, etc., and the weight of the target word for each query term may also be taken as the maximum value. , average 値 or other form of operation as the weight of the word (weighted length). It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, the target string can be used as the query string and the query string as the target string, and no essential difference is generated. It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, the calculation interval of the word distance or length may be the entire string or any interval selected according to the algorithm, such as selecting a difference between another string and another string. section. It should be noted that the matching method does not have to use the minimum sliding window or the editing distance, and can be any calculation about the weighted word distance or the word length. It should be noted that the present application is not limited to a retrieval system such as a search engine, and can be applied to any system that calculates the degree of matching of two strings. The application also discloses an apparatus for sorting search results. Referring to FIG. 4, the method further includes: a word weight table obtaining module 401, configured to calculate a semantic association weight between each two words in the statistical sample, and obtain The word acquisition module 402 is configured to receive a query string input by the user terminal, perform a search according to the query string, and obtain a target string; the word segmentation module 403 is configured to obtain a query string on the server and After the target string, the query string and the target string are respectively segmented; -28- 201131395 The combination module 404 is used to combine the word segments of the query string with the word segmentation of the target string in sequence; a group 405 for querying the word weight table to obtain a weight 每个 for each shard; the matching module 406 is configured to obtain a weighted word length according to the weight ,, and sort each target string And feedback to the user terminal. The above-mentioned word weight table obtaining module 401 may specifically include: a sample obtaining module for acquiring a statistical sample; a first statistical module, configured to select a first word and a second word from the statistical sample, and the statistical The number of times a word and the second word co-occur in the statistical sample C (first word, second word) The second statistical module is used to count the number of occurrences of the second word in the statistical sample ( Yi 'second word) Wherein, Yi represents each word that appears together with the second word; a rate calculation module is configured to calculate a probability P of the first word under the condition of occurrence of the second word (first word second Word) = C (first word, second word) / EC (Yi, second word) weight calculation module, when the second word is queried, the semantic weight of the first word and the second word is W = lP, wherein the w is a weight, the P is a rate of the first word under the condition of occurrence of the second word; generating a module for obtaining semantic correlation of each word in the statistical sample with respect to other words Generate words after weighting Heavy table. When the weighted word length is the minimum sliding window weighted length, the matching module 405 may specifically include: -29- 201131395 The weight minimum 値 acquisition module is used to respectively take each word segment of the target string in the query string. The weight of each participle of the query string is the smallest in the weight of each word segment of the target string; the first computing module is configured to calculate the minimum slip according to the minimum weight of each target string. Window weighting length; sorting module for comparing the minimum sliding window weighting length of each target string, the length is small, the sorting is first, and vice versa, after sorting, that is, the smaller the length, the higher the matching degree is determined, otherwise, the matching is determined. The lower the degree. Applying the embodiment shown in Fig. 4, the degree of matching between each target string and the query string is more accurately reflected by introducing word weights indicating the semantic relevance of the query string and the target string. In practical applications, the application is simple and the effect is good. The embodiment of the present application further provides an apparatus for sorting search results. Referring to FIG. 5, the method includes: a word weight table obtaining module 501, configured to calculate a semantic association weight between each two words in a statistical sample, and obtain and The word acquisition module 502 is configured to receive the query string input by the user terminal, search according to the query string and obtain the target string; the word segmentation module 5 03 is configured to obtain the query string on the server. After the target string, the query string and the target string are respectively segmented; the first weight minimum 値 calculation module 504 is configured to calculate the minimum weight of each of the inserted words relative to the query word segment; The minimum weight 値 calculation module 505 is configured to calculate the minimum weight of each of the deleted words relative to the target word segment; -30- 201131395 matching module 506, for calculating the total editing distance according to the minimum weight ,, Each target string is sorted and fed back to the user terminal. The matching module 506 may specifically include: a first total editing distance calculation module, configured to determine a total editing distance for each target string, where the total editing distance is: Wa, where We represents the total editing distance. , W! indicates that the weight of each word segmentation of the inserted word relative to the query string is the smallest, WD indicates that the weight of each word segmentation of the deleted word relative to the target string is the smallest; and the sorting module is used to compare the total editing distance of each target string, If the editing distance is small, the ranking is first. Otherwise, the sorting is after, that is, the smaller the total editing distance is, the higher the matching degree is. Otherwise, the lower the matching degree is. The device of FIG. 5 may further include: a third weight minimum 値 calculation module, configured to obtain a minimum weight 编辑 of the edit distance of the replacement word before calculating the total edit distance length; at this time, the matching module 505 may be specific include:

第二總編輯距離計算模組,用以對各個目標字串,分 別確定總的編輯距離,所述總的編輯距離爲:WezWdWD + WC 其中,Wes表示總的編輯距離,W,表示插入詞語相對 査詢字串各分詞的權重最小値,WD表示刪除詞語相對目 標字串各分詞的權重最小値,Wc表示替換詞語相對查詢字 串和/或目標字串各分詞的權重最小値; 排序模組,用以比較各目標字串的總的編輯距離,總 -31 - 201131395 的編輯距離小則排序在前,反之,排序在後,也即總的編 輯距離越小時判定匹配程度越高,反之,判定匹配程度越 低。 應用圖5所示裝置,透過引入表示査詢字串和目標字 串的語義關聯度的詞語權重,更準確地反映出了各目標字 串與查詢字串的匹配程度。在實際應用中應用簡單,且效 果好。 需要說明的是,爲了描述的方便,描述以上裝置時以 功能分爲各種模組分別描述。當然,在實施本申請案時可 以把各模組的功能在同一個或多個軟體和/或硬體中實現 〇 需要說明的是,對於系統實施例而言,由於其基本相 似於方法實施例,所以描述的比較簡單,相關之處參見方 法實施例的部分說明即可》 需要說明的是,在本文中,諸如第一和第二等之類的 關係術語僅僅用來將一個實體或者操作與另一個實體或操 作區分開來,而不一定要求或者暗示這些實體或操作之間 存在任何這種實際的關係或者順序。而且,術語“包括”、 “包含”或者其任何其他變體意在涵蓋非排他性的包含,從 而使得包括一系列要素的過程、方法、物品或者設備不僅 包括那些要素,而且還包括沒有明確列出的其他要素,或 者是還包括爲這種過程、方法、物品或者設備所固有的要 素。在沒有更多限制的情況下,由語句“包括—個......’,限 定的要素,並不排除在包括所述要素的過程、方法、物品 -32- 201131395 或者設備中還存在另外的相同要素。 透過以上的實施方式的描述可知,本領域的技術人員 可以清楚地瞭解到本申請案可借助軟體加必需的通用硬體 平臺的方式來實現。基於這樣的理解,本申請案的技術方 案本質上或者說對現有技術做出貢獻的部分可以以軟體產 品的形式體現出來,該電腦軟體産品可以儲存在儲存媒體 中’如ROM/RAM、磁碟、光碟等,包括若干指令用以使 得一台電腦設備(可以是個人電腦,伺服器,或者網路設 備等)執行本申請案之各個實施例或者實施例的某些部分 所述的方法。 本申請案可用於許多通用或專用的計算系統環境或配 置中。例如:個人電腦、伺服器電腦、手持設備或攜帶型 設備 '平板型設備、多處理器系統、基於微處理器的系統 、置頂盒 '可編程的消費電子設備、網路PC、小型電腦、 大型電腦、包括以上任何系統或設備的分散式計算環境等 等。 本申請案可以在由電腦執行的電腦可執行指令的一般 上下文中描述,例如程式模組。一般地說,程式模組包括 執行特定任務或實現特定抽象資料類型的常式、程式、物 件、元件、資料結構等等。也可以在分散式計算環境中實 踐本申請案,在這些分散式計算環境中.,由透過通信網路 而被連接的遠端處理設備來執行任務。在分散式計算環境 中,程式模組可以位於包括儲存設備在內的本地和遠端電 腦儲存媒體中。 -33- 201131395 以上所述僅爲本申請案的較佳實施例而已,並非用於 限定本申請案的保護範圍。凡在本申請案的精神和原則之 內所作的任何修改、等同替換、改進等,均包含在本申請 案的保護範圍內。 【圖式簡單說明】 爲了更清楚地說明本申請案之實施例中的技術方案, 下面將對實施例中所需要使用的附圖作簡單地介紹,顯而 易見地,下面描述中的附圖僅僅是本申請案的一些實施例 ,對於本領域普通技術人員來講,在不付出創造性勞動的 前提下,還可以根據這些附圖獲得其他的附圖。 圖1是根據本申請案實施例的獲得詞語權重表的流程 圖, 圖2是根據本申請案實施例的一種對搜索結果進行排 序的方法流程圖; 圖3是根據本申請案實施例的另一種對搜索結果進行 排序的方法流程圖; 圖4是根據本申請案實施例的一種對搜索結果進行排 序的裝置示意圖: 圖5是根據本申請案實施例的另一種對搜索結果進行 排序的裝置不意圖。 【主要元件符號說明】 401 :詞語權重表獲取模組 -34- 201131395 402 :詞獲取模組 403 :分詞模組 404 :組合模組 4 0 5 :查詢模組 4 0 6 :匹配模組 501 :詞語權重表獲取模組 502 :詞獲取模組 5 03 :分詞模組 5 04 :第一權重最小値計算模組 5 05 :第二權重最小値計算模組 5 0 6 :匹配模組 -35-The second total edit distance calculation module is configured to determine a total edit distance for each target string, the total edit distance is: WezWdWD + WC, where Wes represents the total edit distance, and W represents the inserted words. The weight of each word segmentation of the query string is the smallest, WD indicates that the weight of each word segmentation of the deleted word relative to the target word string is the smallest, and Wc indicates that the weight of the replacement word relative to the query word string and/or the word segment of the target string is the smallest; To compare the total editing distance of each target string, the total editing distance of -31 - 201131395 is ranked first, and vice versa, that is, the smaller the total editing distance is, the higher the matching degree is. The lower the match. By applying the device shown in Fig. 5, the degree of matching between the target string and the query string is more accurately reflected by introducing the word weights indicating the semantic relevance of the query string and the target string. It is simple to apply and effective in practical applications. It should be noted that, for the convenience of description, the above devices are described by function into various modules separately. Of course, in the implementation of the present application, the functions of the modules can be implemented in the same software or software and/or hardware. It should be noted that, for the system embodiment, since it is basically similar to the method embodiment , so the description is relatively simple, see the partial description of the method embodiment for related reasons. It should be noted that in this paper, relational terms such as first and second are used only to refer to an entity or operation. Another entity or operation is distinct and does not necessarily require or imply any such actual relationship or order between the entities or operations. Furthermore, the terms "include", "comprise" or "comprising" or "comprising" or "comprising" or "the" Other elements, or elements that are inherent to such a process, method, item, or device. In the absence of more restrictions, the elements defined by the phrase "including -..." are not excluded from the process, method, or article -32-201131395 or equipment that includes the element. Further identical elements. It will be apparent to those skilled in the art from the above description of the embodiments that the present application can be implemented by means of a software plus a necessary general hardware platform. Based on this understanding, the present application The technical solution in essence or the contribution to the prior art can be embodied in the form of a software product that can be stored in a storage medium such as a ROM/RAM, a disk, a compact disk, etc., including a number of instructions. To enable a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present application or in certain portions of the embodiments. This application can be used in many general or special applications. Computing system environment or configuration. For example: personal computer, server computer, handheld device or portable device Multiprocessor systems, microprocessor-based systems, set-top boxes 'programmable consumer electronics, network PCs, small computers, large computers, decentralized computing environments including any of the above systems or devices, etc. This application can Described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, a program module includes routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Etc. The application can also be practiced in a decentralized computing environment in which tasks are performed by remote processing devices that are connected through a communication network. In a distributed computing environment, the programming model The group may be located in local and remote computer storage media including storage devices. -33- 201131395 The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of protection of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are included in the present application. BRIEF DESCRIPTION OF THE DRAWINGS [Brief Description of the Drawings] In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings to be used in the embodiments will be briefly described below, and obviously, in the following description The drawings are only some of the embodiments of the present application, and those skilled in the art can obtain other drawings according to the drawings without any creative work. Fig. 1 is implemented according to the application. FIG. 2 is a flowchart of a method for sorting search results according to an embodiment of the present application; FIG. 3 is another method for sorting search results according to an embodiment of the present application; FIG. 4 is a schematic diagram of an apparatus for sorting search results according to an embodiment of the present application: FIG. 5 is another apparatus for sorting search results according to an embodiment of the present application. [Main component symbol description] 401: Word weight table acquisition module - 34 - 201131395 402 : Word acquisition module 403 : Word segmentation module 404 : Combination module 4 0 5 : Query module 4 0 6 : Matching module 501: Word weight table acquisition module 502: word acquisition module 5 03 : word segmentation module 5 04 : first weight minimum 値 calculation module 5 05 : second weight minimum 値 calculation module 5 0 6 : matching module -35-

Claims (1)

201131395 七、申請專利範圍: 1. 一種對搜索結果進行排序的方法,其特徵在於,伺 服器預先計算統計樣本中每兩個詞語之間的語義關聯權重 ’以獲得並保存詞語權重表,該方法還包括: 該伺服器接收用戶終端輸入的査詢字串,根據該査詢 字串而進行搜索並獲得目標字串; 該伺服器對該査詢字串和該目標字串分別進行分詞, 將該查詢字串的各分詞依次與該目標字串的分詞兩兩組合 » 查詢詞語權重表,以獲得每個分片語合的權重値;及 根據該權重値而獲得加權詞語長度,根據該加權詞語 長度而對每個目標字串進行排序,並反饋給該用戶終端。 2. 如申請專利範圍第1項所述的方法,其中,該伺服 器預先計算統計樣本中每兩個詞語之間的語義關聯權重, 以獲得詞語權重表的步驟包括: 該伺服器獲取統計樣本; 從該統計樣本中選取第一詞語和第二詞語,以統計該 第一詞語和該第二詞語在該統計樣本中共同出現的次數C (第一詞語,第二詞語); 統計該第二詞語在該統計樣本中出現的次數ZC ( Yi, 第二詞語),其中,該Yi代表每個跟該第二詞語共同出現 的詞語; 計算該第一詞語在該第二詞語出現條件下的槪率P ( 第一詞語丨第二詞語)(第一詞語,第二詞語)/[C ( Yi -36- 201131395 ,第二詞語); 在査詢該第二詞語時’取該第一詞語與該第二詞語的 語義相關權重爲W=l-p,其中,該W爲權重,該P爲該第一 詞語在該第二詞語出現條件下的槪率;及 重複上述步驟,依次獲得該統計樣本中每個詞語相對 其他詞語的語義相關權重,以獲得到詞語權重表。 3 .如申請專利範圍第2項所述的方法,其中,該統計 樣本的來源包括任何形式的文本或符號,該文本包括網頁 文本、用戶搜索日誌、及用戶點擊日誌。 4. 如申請專利範圍第1項所述的方法,其中, 該加權詞語長度爲最小滑動窗口加權長度; 根據該權重値而獲得該加權詞語長度對每個目標字串 進行排序的步驟包括: 分別取該目標字串的各個分詞在該查詢字串各分詞的 權重最小値;或者,分別取該查詢字串的各個分詞在該目 標字串各分詞的權重最小値: 對各個目標字串,根據該權重最小値分別計算該最小 滑動窗口加權長度;及 比較各目標字串的最小滑動視窗加權長度,長度小則 排序在前,反之,排序在後。 5. 如申請專利範圍第4項所述的方法,其中,計算每 個目標字串的最小滑動視窗加權長度具體包括: 該最小滑動窗□加權長度έβ, /=* i=k 卜' 其中,W表示權重,Ti表示該目標字串中的第i個的分 -37- 201131395 詞,k、h分別表示該目標字串最小滑動視窗的起始位置和 結束位置,Qj表示該查詢字串中的第j個分詞,!!!表示該査 詢字串分詞的個數。 6.—種對搜索結果進行排序的方法,其特徵在於,伺 服器預先計算統計樣本中每兩個詞語之間的語義關聯權重 ,以獲得並保存詞語權重表,該方法還包括: 該伺服器接收用戶終端輸入的査詢字串,根據該查詢 字串而進行搜索並獲得目標字串; 該伺服器對該査詢字串和該目標字串分別進行分詞; 該伺服器根據該存詞語權重表,計算插入的詞語相對 該查詢字串各分詞的權重最小値; 該伺服器根據該詞語權重表,計算刪除的詞語相對該 目標字串各分詞的權重最小値;及 根據該權重最小値而計算總的編輯距離,且根據該總 的編輯距離而對每個目標字串進行排序,並反饋給該用戶 終端。 7 .如申請專利範圍第6項所述的方法,其中’該根據 該詞語權重表,計算插入的詞語相對該查詢字串各分詞的 權重最小値的步驟包括: 根據該詞語權重表,獲得插入的詞語相對該查詢字串 各分詞的權重値;及 計算該插入的詞語相對該查詢字串各分詞的權重最小 値爲 =Zminw(A,A) /=1 /=1 j=\ -38- 201131395 其中,w表示權重,It表不插入子串中的第t個的分0司 ,η分別表示插入分詞的個數’ Qj表示該查詢字串中的第j 個分詞,m表示該查詢字串分詞的個數。 8. 如申請專利範圍第6項所述的方法,其中’該根據 該詞語權重表,計算刪除的詞語相對該目標字串各分詞的 權重最小値的步驟包括: 根據該詞語權重表,獲得刪除的詞語相對該目標字串 各分詞的權重値;及 計算該刪除的詞語相對該目標字串各分詞的權重最小 値爲 =Zminw(7I>jD</) d^l rf=l ί=1 其中,W表示權重,Ti表示該目標字串中的第i個的分 詞,q表示該目標字串分詞的個數,Dd表示該刪除詞語中 的第d個分詞,p表示刪除分詞的個數。 9. 如申請專利範圍第6項所述的方法,其中,根據該 權重最小値而計算總的編輯距離,對每個目標字串進行排 序的步驟包括: 對各個目標字串,分別確定該總的編輯距離,該總的 編輯距離爲: W 總=W I + W D 其中,W總表示該總的編輯距離,诃!表示插入詞語相 #該查詢字串各分詞的權重最小値,Wd表示該刪除詞語 相對該目標字串各分詞的權重最小値;及 比較各目標字串的該總的編輯距離,該總的編輯距離 -39- 201131395 小則排序在前,反之,排序在後。 1 0 .如申請專利範圍第6項所述的方法,其中,在計算 該總的編輯距離長度之前’還包括:計算替換詞語的編輯 距離的權重最小値; 根據該權重最小値而計算該總的編輯距離,以確定該 査詢字串和該目標字串的匹配程度的步驟包括: 對各個目標字串’分別確定該總的編輯距離,該總的 編輯距離爲: Ws =Wi + WD + Wc 其中’ W®表示該總的編輯距離,\^,表示插入詞語相 對該查詢字串各分詞的權重最小値,wD表示該刪除詞語 相對該目標字串各分詞的權重最小値,wc表示該替換詞語 相對該査詢字串和/或該目標字串各分詞的權重最小値; 及 比較各目標字串的該總的編輯距離,該總的編輯距離 小則排序在前,反之,排序在後》 11. 如申請專利範圍第10項所述的方法,其中,該獲 取該替換詞語的編輯距離的權重最小値的方式包括: 令該替換詞語的編輯距離的權重最小値等於預設的固 定値,或者, 令該替換詞語的編輯距離等於該插入詞語相對該查詢 字串各分詞的權重最小値與該刪除詞語相對該目標字串各 分詞的權重最小値之和,或平均値,或兩者中的最大値。 12. —種對搜索結果進行排序的裝置,其特徵在於, -40- 201131395 包括: 詞語權重表獲取模組,用以計算統計樣本中每兩個詞 語之間的語義關聯權重,獲得並保存詞語權重表; 詞獲取模組,用以接收用戶終端輸入的查詢字串,根 據該查詢字串而進行搜索並獲得目標字串; 分詞模組,用以在伺服器獲得該查詢字串和該目標字 串後,對該查詢字串和該目標字串分別進行分詞; 組合模組,用以將該查詢字串的各分詞依次與該目標 字串的分詞兩兩組合; 查詢模組,用以査詢該詞語權重表,獲得每個分片語 合的權重値;及 匹配模組,用以根據該權重値而獲得加權詞語長度, 對每個目標字串進行排序,並反饋給該用戶終端。 1 3 .如申請專利範圍第1 2項所述的裝置,其中,該詞 語權重表獲取模組包括: 樣本獲取模組,用以獲取該統計樣本; 第一統計模組,用以從該統計樣本中選取第一詞語和 第二詞語,統計該第一詞語和該第二詞語在該統計樣本中 共同出現的次數C (第一詞語,第二詞語) 第二統計模組,用以統計該第二詞語在該統計樣本中 出現的次數( Yi,第二詞語),其中,該Yi代表每個跟 該第二詞語共同出現的詞語; 槪率計算模組,用以計算該第一詞語在該第二詞語出 現條件下的槪率P (第一詞語丨第二詞語)=C (第一詞語, -41 - 201131395 第二詞語)/Σε ( Yi,第二詞語) 權重計算模組,用以在查詢該第二詞語時,取該第一 詞語與該第二詞語的語義相關權重爲w= 1-P,其中,該W 爲權重,該P爲該第一詞語在該第二詞語出現條件下的槪 率:及 產生模組,用以獲得該統計樣本中每個詞語相對其他 詞語的語義相關權重後,產生該詞語權重表。 1 4.如申請專利範圍第1 2項所述的裝置,其中,當該 加權詞語長度爲最小滑動視窗加權長度時,該匹配模組包 括: 權重最小値獲取模組,用以分別取該目標字串的各個 分詞在該査詢字串各分詞的權重最小値:或者,分別取該 査詢字串的各個分詞在該目標字串各分詞的權重最小値; 第一計算模組,用以對各個目標字串,根據該權重最 小値而分別計算該最小滑動窗口加權長度;及 排序模組,用以比較各目標字串的該最小滑動視窗加 權長度,長度小則排序在前,反之,排序在後。 1 5 .—種對搜索結果進行排序的裝置,其特徵在於, 包括: 詞語權重表獲取模組,用以計算該統計樣本中每兩個 詞語之間的語義關聯權重,獲得並保存詞語權重表; 詞獲取模組,用以接收用戶終端輸入的査詢字串’根 據該查詢字串而進行搜索並獲得目標字串; 分詞模組,用以在伺服器獲得該查詢字串和該目標字 -42- 201131395 串後,對該査詢字串和該目標字串分別進行分詞; 第一權重最小値計算模組’用以計算插入的詞語相對 該查詢字串各分詞的權重最小値; 第二權重最小値計算模組,用以計算刪除的詞語相對 該目標字串各分詞的權重最小値;及 匹配模組’用以根據該權重最小値計算總的編輯距離 ,對每個目標字串進行排序,並反饋給該用戶終端。 1 6 ·如申請專利範圍第1 5項所述的裝置,其中,該匹 配模組包括: 第一總編輯距離計算模組,用以對各個目標字串,分 別確定該總的編輯距離,該總的編輯距離爲:Wa^WdWD 其中,W*s表示該總的編輯距離,以,表示該插入詞語 相對該查詢字串各分詞的權重最小値,WD表示該刪除詞 語相對該目標字串各分詞的權重最小値;及 排序模組,用以比較各目標字串的該總的編輯距離, 該總的編輯距離小則排序在前,反之,排序在後。 1 7 .如申請專利範圍第1 5項所述的裝置,其中,該裝 置還包括: 第三權重最小値計算模組,用以在計算該總的編輯距 離長度之前,獲取該替換詞語的編輯距離的權重最小値; 該匹配模組包括: 第二總編輯距離計算模組,用以對各個目標字串,分 別確定該總的編輯距離,該總的編輯距離爲:Wm =W, + WD + W c -43- 201131395 其中’ Wis表示該總的編輯距離,评:表示該插入詞語 相對該査詢字串各分詞的權重最小値,WD表示該刪除詞 語相對該目標字串各分詞的權重最小値,Wc表示該替換詞 語相對該查詢字串和/或該目標字串各分詞的權重最小値 :及 排序模組,用以比較各目標字串的該總的編輯距離, 該總的編輯距離小則排序在前,反之’排序在後。 -44 -201131395 VII. Patent application scope: 1. A method for sorting search results, characterized in that the server pre-calculates the semantic association weight between each two words in the statistical sample to obtain and save the word weight table, the method The method further includes: the server receiving a query string input by the user terminal, searching according to the query string and obtaining a target string; the server separately segmenting the query string and the target string, the query word Each participle of the string is combined with the participle of the target string in turn » Query the word weight table to obtain the weight 每个 of each piece merging; and the weighted word length is obtained according to the weight ,, according to the length of the weighted word Each target string is sorted and fed back to the user terminal. 2. The method of claim 1, wherein the server pre-calculates a semantic association weight between each two words in the statistical sample, and the step of obtaining the word weight table comprises: the server obtaining the statistical sample Selecting the first word and the second word from the statistical sample to count the number C (first word, second word) in which the first word and the second word co-occur in the statistical sample; The number of occurrences of the word in the statistical sample ZC ( Yi, second word), wherein the Yi represents each word that appears together with the second word; calculating the first word in the condition of occurrence of the second word Rate P (first word 丨 second word) (first word, second word) / [C ( Yi -36- 201131395 , second word); when querying the second word 'take the first word and the The semantic relevance weight of the second word is W=lp, where W is a weight, and the P is a rate of the first word under the condition of occurrence of the second word; and the above steps are repeated, and each of the statistical samples is sequentially obtained. Words Semantically related to other words in the right weight to get the words weight table. 3. The method of claim 2, wherein the source of the statistical sample comprises any form of text or symbol including web page text, user search logs, and user click logs. 4. The method of claim 1, wherein the weighted word length is a minimum sliding window weighted length; and the step of obtaining the weighted word length according to the weight 对 to sort each target string comprises: Each participle of the target string has the smallest weight of each participle of the query string; or, each of the participles of the query string respectively has the smallest weight of each participle of the target string: for each target string, according to The minimum weight is calculated separately for the minimum sliding window weighted length; and the minimum sliding window weighting length of each target string is compared, and the length is small, the sort is first, and vice versa, the sorting is followed. 5. The method of claim 4, wherein calculating the minimum sliding window weighting length of each target string comprises: the minimum sliding window □ weighting length έβ, /=* i=k 卜' W denotes a weight, Ti denotes the i-th sub-37-201131395 word in the target string, k and h respectively represent the start position and end position of the minimum sliding window of the target string, and Qj represents the query string. The jth participle,! !! indicates the number of word segmentation of the query string. 6. A method for sorting search results, wherein the server pre-calculates a semantic association weight between each two words in the statistical sample to obtain and save a word weight table, the method further comprising: the server Receiving a query string input by the user terminal, searching according to the query string and obtaining a target string; the server separately segmenting the query string and the target string; the server is based on the stored word weight table, Calculating the minimum weight of the inserted words relative to each word segment of the query string; the server calculates, according to the word weight table, the weight of the deleted words relative to each of the target word segments; and calculating the total according to the minimum weight Editing distance, and sorting each target string according to the total editing distance, and feeding back to the user terminal. 7. The method of claim 6, wherein the step of calculating the minimum weight of the inserted words relative to the word segmentation of the query string according to the word weight table comprises: obtaining an insertion according to the word weight table The weight of the word relative to each participle of the query string; and the minimum weight of the inserted word relative to each part of the query string is =Zminw(A,A) /=1 /=1 j=\ -38- 201131395 where w represents the weight, the It table does not insert the t-th division of the sub-string, η denotes the number of inserted participles respectively, Qj denotes the j-th participle in the query string, and m denotes the query word The number of strings of words. 8. The method of claim 6, wherein the step of calculating the minimum weight of the deleted word relative to each of the target word segmentation according to the word weight table comprises: obtaining the deletion according to the word weight table The weight of the word relative to each participle of the target string; and the minimum weight of the word of the deleted word relative to the word segment of the target string is =Zminw(7I>jD</) d^l rf=l ί=1 W represents the weight, Ti represents the i-th part of the target string, q represents the number of the target word segmentation, Dd represents the d-th part of the deleted word, and p represents the number of deleted participles. 9. The method of claim 6, wherein the total edit distance is calculated according to the minimum weight, and the step of sorting each target string comprises: determining the total for each target string separately The editing distance of the total editing distance is: W total = WI + WD where W always indicates the total editing distance, hehe! Indicates that the insertion word phase # has the smallest weight of each word segment of the query string, Wd indicates that the weight of the deleted word is the smallest relative to each word segment of the target string; and the total editing distance of each target string is compared, the total editing The distance -39- 201131395 is sorted first, and vice versa. The method of claim 6, wherein before calculating the total edit distance length, the method further comprises: calculating a weight minimum of the edit distance of the replacement word; calculating the total according to the weight minimum 値The editing distance to determine the degree of matching between the query string and the target string includes: determining the total edit distance for each target string ', respectively, the total edit distance is: Ws = Wi + WD + Wc Where 'W® indicates the total edit distance, \^, indicating that the insertion word has the smallest weight relative to each participle of the query string, wD indicates that the weight of the deleted word is the smallest relative to each participle of the target string, and wc indicates the replacement The weight of the word relative to the query string and/or the word segment of the target string is the smallest; and the total edit distance of each target string is compared, and the total edit distance is smaller, and the order is earlier; 11. The method of claim 10, wherein the method of obtaining the minimum weight of the edit distance of the replacement term comprises: making an edit distance of the replacement term The weight minimum 値 is equal to the preset fixed 値, or the editing distance of the replacement word is equal to the minimum weight of the inserted word relative to each word segment of the query string, and the weight of the deleted word relative to the target word of the deleted word is the smallest 値The sum, or average, or the largest of the two. 12. Apparatus for sorting search results, characterized in that -40-201131395 comprises: a word weight table acquisition module for calculating a semantic association weight between each two words in a statistical sample, obtaining and saving words a weight obtaining table, configured to receive a query string input by the user terminal, perform a search according to the query string and obtain a target string; a word segmentation module, configured to obtain the query string and the target at the server After the string, the query string and the target string are respectively segmented; the combination module is configured to combine the word segments of the query string with the word segmentation of the target string in turn; the query module is used for The word weight table is queried, and the weight of each piece is obtained; and the matching module is configured to obtain the weighted word length according to the weight, sort each target string, and feed back to the user terminal. The apparatus of claim 12, wherein the word weight table acquisition module comprises: a sample acquisition module for acquiring the statistical sample; and a first statistical module for using the statistics Selecting a first word and a second word in the sample, and counting a number C (first word, second word) of the first word and the second word co-occurring in the statistical sample, the second statistical module is used to count the The number of occurrences of the second word in the statistical sample (Y, second word), wherein the Yi represents each word that appears together with the second word; the rate calculation module is configured to calculate the first word in The probability P (first word 丨 second word) = C (first word, -41 - 201131395 second word) / Σ ε ( Yi, second word) weight calculation module under the condition of occurrence of the second word When the second word is queried, the semantic relevance weight of the first word and the second word is w=1-P, wherein the W is a weight, and the P is the first word appears in the second word Rate of conditions: and the generation of modules for After each word relative to other words in the semantic relevance weights the statistical sample weight, the weight table to generate the word. The device of claim 12, wherein when the length of the weighted word is the minimum sliding window weighted length, the matching module comprises: a weight minimum acquisition module for respectively taking the target Each participle of the string has the smallest weight of each participle of the query string: or, each of the participle of the query string respectively has the smallest weight of each participle of the target string; the first computing module is used for each a target string, respectively calculating a minimum sliding window weighting length according to the minimum weight; and a sorting module for comparing the minimum sliding window weighting length of each target string, wherein the length is small and the sorting is first; Rear. A device for sorting search results, comprising: a word weight table obtaining module, configured to calculate a semantic association weight between each two words in the statistical sample, and obtain and save a word weight table a word acquisition module, configured to receive a query string input by the user terminal to search according to the query string and obtain a target string; a word segment module for obtaining the query string and the target word at the server - 42- 201131395 After the string, the query string and the target string are respectively segmented; the first weight minimum 値 calculation module 'is used to calculate the minimum weight of the inserted words relative to the word segment of the query string; second weight a minimum 値 calculation module for calculating a minimum weight of the deleted words relative to each of the target word segments; and a matching module 'for calculating a total editing distance according to the minimum weight ,, sorting each target string And feedback to the user terminal. The apparatus of claim 15, wherein the matching module comprises: a first total editing distance calculation module, configured to determine the total editing distance for each target string, The total edit distance is: Wa^WdWD, where W*s represents the total edit distance, indicating that the insertion word has the least weight relative to each participle of the query string, and WD indicates that the deleted word is relative to the target string. The weight of the word segmentation is the smallest; and the sorting module is used to compare the total editing distance of each target string, and the total editing distance is small, and the sorting is followed by the sorting. The device of claim 15 , wherein the device further comprises: a third weight minimum calculation module, configured to acquire an edit of the replacement word before calculating the total edit distance length The weight of the distance is the smallest; the matching module includes: a second total editing distance calculation module, configured to determine the total editing distance for each target string, the total editing distance is: Wm = W, + WD + W c -43- 201131395 where ' Wis indicates the total edit distance, comment: indicates that the insertion word has the least weight relative to each participle of the query string, and WD indicates that the weight of the deleted word is the smallest relative to each part of the target string.値, Wc represents the minimum weight of the replacement word relative to the query string and/or the word segmentation of the target string: and a sorting module for comparing the total edit distance of each target string, the total edit distance Small is sorted first, otherwise 'sorted behind. -44 -
TW099106782A 2010-03-09 2010-03-09 Methods and devices for sorting search results TWI486797B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW099106782A TWI486797B (en) 2010-03-09 2010-03-09 Methods and devices for sorting search results

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW099106782A TWI486797B (en) 2010-03-09 2010-03-09 Methods and devices for sorting search results

Publications (2)

Publication Number Publication Date
TW201131395A true TW201131395A (en) 2011-09-16
TWI486797B TWI486797B (en) 2015-06-01

Family

ID=50180358

Family Applications (1)

Application Number Title Priority Date Filing Date
TW099106782A TWI486797B (en) 2010-03-09 2010-03-09 Methods and devices for sorting search results

Country Status (1)

Country Link
TW (1) TWI486797B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7398461B1 (en) * 2002-01-24 2008-07-08 Overture Services, Inc. Method for ranking web page search results
US7734565B2 (en) * 2003-01-18 2010-06-08 Yahoo! Inc. Query string matching method and apparatus
EP2013788A4 (en) * 2006-04-25 2012-04-25 Infovell Inc Full text query and search systems and method of use
TWI356315B (en) * 2007-10-16 2012-01-11 Inst Information Industry Method and system for constructing data tag based
TW200928810A (en) * 2007-12-31 2009-07-01 Aletheia University Method for searching data
US8812493B2 (en) * 2008-04-11 2014-08-19 Microsoft Corporation Search results ranking using editing distance and document information

Also Published As

Publication number Publication date
TWI486797B (en) 2015-06-01

Similar Documents

Publication Publication Date Title
JP7346609B2 (en) Systems and methods for performing semantic exploration using natural language understanding (NLU) frameworks
WO2021051521A1 (en) Response information obtaining method and apparatus, computer device, and storage medium
US8856098B2 (en) Ranking search results based on word weight
CN107992585B (en) Universal label mining method, device, server and medium
US8560513B2 (en) Searching for information based on generic attributes of the query
US10042896B2 (en) Providing search recommendation
WO2017084506A1 (en) Method and device for correcting search query term
CN104199965B (en) Semantic information retrieval method
CN110888990B (en) Text recommendation method, device, equipment and medium
CN107256267A (en) Querying method and device
CN103400286B (en) A kind of commending system and method carrying out article characteristics mark based on user behavior
JP7356973B2 (en) Method, computer program and computer system for ranking multiple documents
CN110377725B (en) Data generation method and device, computer equipment and storage medium
JP6355840B2 (en) Stopword identification method and apparatus
WO2015035401A1 (en) Automated discovery using textual analysis
CN110889292B (en) Text data viewpoint abstract generating method and system based on sentence meaning structure model
Bawakid et al. A Semantic Summarization System: University of Birmingham at TAC 2008.
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
TW201131395A (en) Method and apparatus for carrying out sorting the search result
CN114218431A (en) Video searching method and device, electronic equipment and storage medium
US11726972B2 (en) Directed data indexing based on conceptual relevance
TWI534640B (en) Chinese network information monitoring and analysis system and its method
Yang et al. Research on the Sentiment analysis of customer reviews based on the ontology of phone
CN110930189A (en) Personalized marketing method based on user behaviors
de Souza et al. Mt quality estimation for e-commerce data

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees