TW201131395A

TW201131395A - Method and apparatus for carrying out sorting the search result

Info

Publication number: TW201131395A
Application number: TW99106782A
Authority: TW
Inventors: yu-heng Xie; Fei Xing; Ning Guo; Lei Hou; Qin Zhang
Original assignee: Alibaba Group Holding Ltd
Priority date: 2010-03-09
Filing date: 2010-03-09
Publication date: 2011-09-16
Also published as: TWI486797B

Abstract

The present application case discloses a method and an apparatus for carrying out sorting the search result, wherein the method includes: the server calculates in advance the semantics correlation weight between every two phrases in the statistic sample, and obtains and stores the phrase weighing table. The method also includes: the server receives the inquiry character string inputted by the user terminal, and carries out the search and obtains the target character string according to the inquiry character string; the server separately carries out phrase segmentation for the inquiry and target character strings and sequentially combines each of the segmented phrases of the inquiry character string with the segmented phrase of the target character string; the phrase weighing table is looked up to obtain the weighing value of each combined sub-phrase; the weighed phrase length can be obtained according to the weighing value, and the weighed phrase length is based on to carry out sorting for each target character string and feed back to the user terminal. The present application case, through introduction of the phrase weighing of the semantics correlation weight representing the inquiry and target character strings, can reflect more accurately the matching degree of the target and inquiry character strings, thereby achieving easy application in practice and excellent efficacy.

Description

201131395 六、發明說明：【發明所屬之技術領域】本申請案係有關電腦資料處理技術領域，特別是指一種對搜索結果進行排序的方法和裝置。 1J 術技前先在搜索引擎中，需要根據查詢字串的幾個詞在檢索結果（目標字串）中出現的位置距離來估計檢索結果與查詢字串的匹配程度，距離近的通常具有更高的匹配程度，因而獲得更加靠前的排名。例如查詢字串是“消毒機”，包含“消毒機”的檢索結果通常比“消毒工業洗衣機”更接近用戶的意圖，而後者又比“消毒設備、脫水器、烘乾機 ’’更接近用戶的意圖，這都將影響檢索結果的排名。計算查詢字串的多個詞語在目標字串中的距離的-種習知實現方式是最小滑動視窗，亦即，在目標字串中尋找 —個長度儘量小的區間，該區間中包含查詢字串的每一個字和詞’用這個區間的長度來描述查詢詞語在目標字串中的遠近。例如査詢字串是“我丨看丨風景”，目標字串是“ 我丨在丨橋丨上丨看丨風景卜|看丨風景丨的I人丨在I橋丨下丨看I我卜” (豎線代表分詞結果）則最小滑動窗口是“我I在I橋I上丨看丨風景”，長度爲6個詞語。另一種計算詞語長度的方法是編輯距離，跟最小滑動窗口不一樣的是，它並不是計算單—字串的詞語長度，而是計算兩個字串間的差異部分的長度之和。例如“我和你’ 201131395 和“大和小”差異部分共兩個詞（第一和第三個詞），編輯距離爲2。目前’通常是根據長度或距離確定查詢字串和目標字串的匹配程度’也就是說，如果最小滑動窗口長度或編輯距離越小，則匹配程度越高，反之則匹配程度低。然而在某些情況下，簡單的長度或距離並不能準確地反映匹配程度。例如査詢字串是“諾基亞電池”，檢索結果 A是“諾基亞電池” ’ B是“諾基亞手機，贈送電池”，c是“諾基亞n73手機原裝電池”。按照簡單的距離計算，A的“諾基亞”和“電池”之間的距離爲0，匹配程度最好；B和C的“諾基亞”和“電池”之間的距離都是3個詞，匹配程度都不夠好。但是實際上C的“n73手機”是跟“諾基亞”強烈相關的詞語，“原裝”也是跟“電池”強烈相關的詞語，雖然中間都是間隔了 3個詞，但是C的匹配程度比B高很多。考慮不同詞語在距離計算上的不同影響，前人已有一些硏究，例如可以根據詞性（POS )來設定詞語權重。但是這種根據詞性來設定權重的方法，仍舊過於簡單，沒有涉及一個本質問題，就是査詢字串和目標字串語義是否相關，因而得到的長度或距離不能準確地反映出查詢字串和目標字串的匹配程度，亦即，不能保證和查詢字串語義相關的目標字串被排在前面。【發明內容】本申請案提供一種對搜索結果進行排序的方法和裝置 -6 - 201131395 ，透過查詢字串和目標字串的語義關聯度，能夠更準確地對目標字串進行排序，反映出各目標字串與査詢字串的匹配程度。本申請案提供了一種對搜索結果進行排序的方法，包括：伺服器預先計算統計樣本中每雨個詞語之間的語義關聯權重，獲得並保存詞語權重表，所述方法還包括：伺服器接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串；伺服器對所述査詢字串和目標字串分別進行分詞，將查詢字串的各分詞依次與目標字串的分詞兩兩組合；查詢詞語權重表，獲得每個分片語合的權重値；及根據所述權重値而獲得加權詞語長度，根據所述加權詞語長度而對每個目標字串進行排序，並反饋給用戶終端 0 其中’所述伺服器預先計算統計樣本中每兩個詞語之間的語義關聯權重，獲得詞語權重表的步驟包括：伺服器獲取統計樣本；從所述統計樣本中選取第一詞語和第二詞語，統計所述第一詞語和第二詞語在統計樣本中共同出現的次數c ( 第一詞語，第二詞語）；統計第二詞語在統計樣本中出現的次數ZC ( Yi，第二詞語）’其中’所述Yi代表每個跟第二詞語共同出現的詞語；計算所述第一詞語在第二詞語出現條件下的槪率P ( 201131395 第一詞語丨第二詞語）=C (第一詞語，第二詞語）/EC ( Yi ，第二詞語.）；在査詢第二詞語時，取第一詞語與第二詞語的語義相關權重爲W=l-P，其中，所述W爲權重，所述P爲第一詞語在第二詞語出現條件下的槪率；及重複上述步驟，依次獲得所述統計樣本中每個詞語相對其他詞語的語義相關權重，獲得到詞語權重表。其中，所述統計樣本的來源包括任何形式的文本或符號，所述文本包括網頁文本、用戶搜索日誌、及用戶點擊曰誌。其中，所述加權詞語長度爲最小滑動窗口加權長度；根據所述權重値而獲得加權詞語長度對每個目標字串進行排序的步驟包括：分別取目標字串的各個分詞在査詢字串各分詞的權重最小値；或者，分別取查詢字串的各個分詞在目標字串各分詞的權重最小値；對各個目標字串，根據所述權重最小値分別計算最小滑動窗口加權長度；及比較各目標字串的最小滑動視窗加權長度，長度小則排序在前，反之，排序在後。其中，計算每個目標字串的最小滑動視窗加權長度具體包括：最小滑動窗口加權長度έπ = ， i=k l=k J=] 其中，W表示權重，Ti表示目標字串中的第i個的分詞 -8 - 201131395 ，k、h分別表示目標字串最小滑動視窗的起始位置和結束位置，Qj表示查詢字串中的第j個分詞，m表示查詢字串分詞的個數。本申請案還提供了一種對搜索結果進行排序的方法，伺服器預先計算統計樣本中每兩個詞語之間的語義關聯權重，獲得並保存詞語權重表’所述方法還包括：伺服器接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串；伺服器對所述査詢字串和目標字串分別進行分詞；伺服器根據所述存詞語權重表，計算插入的詞語相對查詢字串各分詞的權重最小値；伺服器根據所述存詞語權重表’計算刪除的詞語相對目標字串各分詞的權重最小値；及根據所述權重最小値計算總的編輯距離’根據所述總的編輯距離對每個目標字串進行排序’並反饋給用戶終端〇其中，所述根據所述詞語權重表，計算插入的詞語相對查詢字串各分詞的權重最小値的步驟包括- 根據詞語權重表’獲得插入的詞語相對査詢字串各分詞的權重値；及計算插入的詞語相對查詢字串各分詞的權重最小値爲 Σ = Σ minw^，込) i=l /=1 7=1 其中，W表示權重’ It表示插入字串中的第t個的分詞，η分別表示插入分詞的個數’ Qj表示查詢字串中的第j個 -9 - 201131395 分詞，m表示査詢字串分詞的個數。其中，所述根據所述詞語權重表，計算刪除的詞語相對目標字串各分詞的權重最小値的步驟包括：根據詞語權重表，獲得刪除的詞語相對目標字串各分詞的權重値；計算刪除的詞語相對目標字串各分詞的權重最小値爲 </-l d=\ 1=1 其中，W表示權重，Ti表示目標字串中的第i個的分詞，q表示目標字串分詞的個數，Dd表示刪除詞語中的第d個分詞，P表示刪除分詞的個數。其中，根據所述權重最小値計算總的編輯距離’對每個目標字串進行排序的步驟包括：對各個目標字串，分別確定總的編輯距離，所述總的編輯距離爲：201131395 VI. Description of the Invention: [Technical Field] The present application relates to the field of computer data processing technology, and more particularly to a method and apparatus for sorting search results. Before the 1J technique, in the search engine, it is necessary to estimate the matching degree between the search result and the query string according to the position distance of several words of the query string in the search result (target string), and the distance is generally more A high degree of matching results in a higher ranking. For example, the query string is a "disinfector", and the search result containing the "disinfector" is usually closer to the user's intention than the "disinfecting industrial washing machine", which is closer to the user than the "disinfecting device, dehydrator, dryer" Intent, which will affect the ranking of the search results. The traditional implementation of calculating the distance of multiple words of the query string in the target string is the minimum sliding window, that is, looking for the target string. The interval with the smallest possible length, the word and the word containing the query string in the interval 'use the length of the interval to describe the distance of the query word in the target string. For example, the query string is "I look at the scenery" The target string is "I am squatting on the bridge and watching the scenery. I am watching the scenery. I am watching I I under the bridge." (The vertical line represents the result of the word segmentation) The minimum sliding window is " I I look at the scenery on I Bridge I. The length is 6 words. Another way to calculate the length of a word is to edit the distance. Unlike the smallest sliding window, it is not the length of the word for the single-string. And And calculate the length of the portion of the difference between the two strings. For example, "I and you '201,131,395 and" big and small "part of a total difference in two words (the first and third words), edit distance of 2. Currently, 'the degree of matching between the query string and the target string is usually determined according to the length or distance'. That is, if the minimum sliding window length or editing distance is smaller, the matching degree is higher, and vice versa. However, in some cases, a simple length or distance does not accurately reflect the degree of matching. For example, the query string is "Nokia battery", the search result A is "Nokia battery" ‘B is “Nokia mobile phone, free battery”, c is “Nokia n73 mobile phone original battery”. According to the simple distance calculation, the distance between A's "Nokia" and "Battery" is 0, and the matching degree is the best; the distance between "Nokia" and "Battery" of B and C is 3 words, matching degree Not good enough. But in fact, C's "n73 mobile phone" is a strong word related to "Nokia". "Original" is also a word strongly related to "battery". Although there are 3 words in the middle, the matching degree of C is higher than B. a lot of. Considering the different influences of different words on the distance calculation, the predecessors have some research, for example, the word weight can be set according to part of speech (POS). However, this method of setting weights according to part of speech is still too simple. It does not involve an essential question, that is, whether the query string and the target string semantics are related, and thus the obtained length or distance cannot accurately reflect the query string and the target word. The degree of matching of the strings, that is, the target string associated with the query string semantics is not guaranteed to be ranked first. SUMMARY OF THE INVENTION The present application provides a method and apparatus for sorting search results -6 - 201131395. By querying the semantic relevance of a string and a target string, the target string can be more accurately sorted, reflecting each The degree to which the target string matches the query string. The present application provides a method for sorting search results, including: the server pre-calculates the semantic association weight between each rained word in the statistical sample, obtains and saves the word weight table, and the method further includes: receiving by the server The query string input by the user terminal searches for the target string according to the query string; the server separately separates the query string and the target string, and sequentially segments the word segment of the query string with the target string. Combining word segmentation two-two; querying a word weight table, obtaining a weight 每个 of each piece merging; and obtaining a weighted word length according to the weight ,, sorting each target string according to the weighted word length, and Feedback to the user terminal 0, wherein the server pre-calculates the semantic association weight between each two words in the statistical sample, and the step of obtaining the word weight table comprises: the server acquiring the statistical sample; selecting the first from the statistical sample a word and a second word, counting the number c of occurrences of the first word and the second word co-occurring in the statistical sample (first Word, second word); count the number of occurrences of the second word in the statistical sample ZC ( Yi, second word) 'where 'the Yi represents each word that appears together with the second word; calculate the first word The rate of p (201131395 first word 丨 second word) = C (first word, second word) / EC ( Yi, second word.) in the presence of the second word; Taking the semantic correlation weight of the first word and the second word as W=lP, wherein the W is a weight, and the P is a rate of the first word under the condition of occurrence of the second word; and repeating the above steps, sequentially obtaining The semantic weights of each word in the statistical sample relative to other words are obtained into the word weight table. Wherein, the source of the statistical sample includes any form of text or symbol, the text including webpage text, a user search log, and a user click. The weighted word length is a minimum sliding window weighting length; and the step of obtaining the weighted word length according to the weight 对 to sort each target string includes: respectively taking each word segment of the target string in each part of the query string The weight of each word segment of the query string is the smallest at each target word string; for each target word string, the minimum sliding window weighted length is calculated according to the minimum weight ;; and each target is compared The minimum sliding window weighting length of the string, the length is small, the order is first, and vice versa, the sorting is later. The calculation of the minimum sliding window weighting length of each target string specifically includes: a minimum sliding window weighting length έπ = , i=kl=k J=] wherein W represents a weight, and Ti represents an ith of the target string. Word segment -8 - 201131395, k, h respectively represent the start position and end position of the minimum sliding window of the target string, Qj represents the jth participle in the query string, and m represents the number of query word segmentation. The application also provides a method for sorting search results, the server pre-calculates the semantic association weight between each two words in the statistical sample, and obtains and saves the word weight table. The method further includes: the server receives the user a query string input by the terminal, searching according to the query string and obtaining a target string; the server separately segmenting the query string and the target string; the server calculates the inserted words according to the stored word weight table The weight of each word segment of the query string is the smallest; the server calculates the minimum weight of each word segmentation of the deleted word relative to the target word string according to the stored word weight table; and calculates the total editing distance according to the minimum weight ' The total edit distance sorts each target string and feeds back to the user terminal, wherein according to the word weight table, the step of calculating the minimum weight of each of the inserted words relative to the query word segment includes: Obtaining the weights of the inserted words relative to the word segmentation of the query string according to the word weight table'; and calculating the insertion The weight of each word in the relative query string is Σ = Σ minw^, 込) i=l /=1 7=1 where W is the weight 'It indicates the t-th part of the inserted string, η Respectively indicate the number of inserted participles 'Qj' represents the jth -9 - 201131395 participle in the query string, and m denotes the number of query word segmentation. The step of calculating the minimum weight of each of the deleted words relative to the target word string according to the word weight table includes: obtaining, according to the word weight table, the weight of each of the deleted words relative to the target word segment; The minimum weight of each word in the target word string is </-ld=\ 1=1 where W represents the weight, Ti represents the i-th part of the target string, and q represents the target word segmentation The number, Dd indicates the d-th part of the deleted word, and P indicates the number of the deleted participle. The step of sorting each target string according to the minimum weight 値 calculating the total edit distance ′′ includes: determining a total edit distance for each target string, the total edit distance being:

We =W, + WD 其中，W*®表示總的編輯距離，W[表示插入詞語相對査詢字串各分詞的權重最小値，WD表示刪除詞語相對目標字串各分詞的權重最小値，及比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後。其中，在計算總的編輯距離長度之前’還包括：計算替換詞語的編輯距離的權重最小値：根據所述權重最小値而計算總的編輯距離’確定查詢字串和目標字串的匹配程度的步驟包括： -10- 201131395 對各個目標字串，分別確定總的編輯距離’所述總的編輯距離爲： W 總=W1 + W d + W c 其中，W®表示總的編輯距離，W!表示插入詞語相對査詢字串各分詞的權重最小値，Wd表示刪除詞語相對目標字串各分詞的權重最小値，Wc表示替換詞語相對查詢字串和/或目標字串各分詞的權重最小値；及比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後。其中，所述獲取替換詞語的編輯距離的權重最小値的方式包括：令替換詞語的編輯距離的權重最小値等於預設的固定値，或者，令替換詞語的編輯距離等於插入詞語相對查詢字串各分詞的權重最小値與刪除詞語相對目標字串各分詞的權重最小値之和，或平均値，或兩者中的最大値。本申請案還提供了一種對搜索結果進行排序的裝置，包括：詞語權重表獲取模組，用以計算統計樣本中每兩個詞語之間的語義關聯權重，獲得並保存詞語權重表；詞獲取模組，用以接收用戶終端輸入的查詢字串，根據査詢字串而進行搜索並獲得目標字串；分詞模組，用以在伺服器獲得查詢字串和目標字串後，對所述查詢字串和目標字串分別進行分詞； -11 - 201131395 組合模組，用以將査詢字串的各分詞依次與目標字串的分詞兩兩組合；查詢模組，用以査詢所述詞語權重表，獲得每個分片語合的權重値；及匹配模組，用以根據所述權重値而獲得加權詞語長度，對每個目標字串進行排序，並反饋給用戶終端》其中，所述詞語權重表獲取模組包括：樣本獲取模組，用以獲取統計樣本；第一統計模組，用以從所述統計樣本中選取第一詞語和第二詞語，統計所述第一詞語和第二詞語在統計樣本中共同出現的次數C (第一詞語，第二詞語）第二統計模組，用以統計第二詞語在統計樣本中出現的次數EC(Yi’第二詞語），其中，所述Yi代表每個跟第二詞語共同出現的詞語：槪率計算模組，用以計算所述第一詞語在第二詞語出現條件下的槪率P (第一詞語丨第二詞語）=C (第一詞語，第二詞語）/[C ( Yi，第二詞語）權重計算模組’用以在查詢第二詞語時，取第一詞語與第二詞語的語義相關權重爲W=l-P，其中，所述W爲權重，所述P爲第一詞語在第二詞語出現條件下的槪率；及產生模組，用以獲得所述統計樣本中每個詞語相對其他詞5吾的語義相關權重後’產生詞語權重表。其中，當所述加權詞語長度爲最小滑動視窗加權長度時，所述匹配模組包括： -12- 201131395 權重最小値獲取模組，用以分別取目標字串的各個分詞在查詢字串各分詞的權重最小値；或者，分別取査詢字串的各個分詞在目標字串各分詞的權重最小値；第一計算模組，用以對各個目標字串，根據所述權重最小値分別計算最小滑動窗口加權長度；及排序模組，用以比較各目標字串的最小滑動視窗加權長度，長度小則排序在前，反之，排序在後。本申請案還提供了一種對搜索結果進行排序的裝置，包括’· 詞語權重表獲取模組，用以計算統計樣本中每兩個詞語之間的語義關聯權重，以獲得並保存詞語權重表；詞獲取模組，用以接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串；分詞模組，用以在伺服器獲得查詢字串和目標字串後，對所述查詢字串和目標字串分別進行分詞；第一權重最小値計算模組，用以計算插入的詞語相對查詢字串各分詞的權重最小値；第二權重最小値計算模組，用以計算刪除的詞語相對目標字串各分詞的權重最小値；及匹配模組’用以根據所述權重最小値而5十算總的編輯距離，對每個目標字串進行排序’並反饋給用戶終端。其中，所述匹配模組包括：We =W, + WD where W*® indicates the total edit distance, W[ indicates that the weight of each word segmentation of the inserted word relative to the query string is the smallest, and WD indicates that the weight of each word segmentation of the deleted word relative to the target string is the smallest, and comparison The total edit distance of each target string. If the total edit distance is small, the sort is first. Otherwise, the sort is after. Wherein, before calculating the total edit distance length, the method further includes: calculating a minimum weight of the edit distance of the replacement word: calculating a total edit distance according to the minimum weight ' 'determining the matching degree between the query string and the target string The steps include: -10- 201131395 For each target string, determine the total edit distance respectively. The total edit distance is: W total = W1 + W d + W c where W® represents the total edit distance, W! Indicates that the weight of each word segmentation of the inserted word relative to the query string is the smallest, Wd indicates that the weight of each word segmentation of the deleted word relative to the target word string is the smallest, and Wc indicates that the weight of the replacement word relative to the query word string and/or the word segment of the target string is the smallest; And compare the total edit distance of each target string, the total edit distance is small, the sort is first, and vice versa, the sort is after. The manner of obtaining the minimum weight of the edit distance of the replacement word includes: making the weight of the edit distance of the replacement word minimum 値 equal to the preset fixed 値, or making the edit distance of the replacement word equal to the inserted word relative query string The weight of each participle is the smallest and the smallest of the weights of the word segmentation relative to the target word, or the average 値, or the largest 两者 of the two. The application further provides an apparatus for sorting search results, comprising: a word weight table obtaining module, configured to calculate a semantic association weight between each two words in a statistical sample, obtain and save a word weight table; The module is configured to receive a query string input by the user terminal, perform a search according to the query string and obtain a target string; and a word segmentation module, configured to: after the server obtains the query string and the target string, the query The word string and the target string are respectively segmented; -11 - 201131395 The combination module is used to combine the word segments of the query string with the word segmentation of the target string in turn; the query module is used to query the word weight table Obtaining a weight of each fragmentation; and a matching module for obtaining a weighted word length according to the weight, sorting each target string, and feeding back to the user terminal, wherein the words The weight table acquisition module includes: a sample acquisition module for acquiring a statistical sample; a first statistical module, configured to select the first word and the second word from the statistical sample And counting the number C (first word, second word) of the first word and the second word co-occurring in the statistical sample, the second statistical module, for counting the number of occurrences of the second word in the statistical sample EC ( Yi' second word), wherein the Yi represents each word that appears together with the second word: a rate calculation module for calculating a probability P of the first word under the second word occurrence condition ( The first word 丨 second word) = C (first word, second word) / [C ( Yi, second word) weight calculation module 'used to take the first word and the second when querying the second word The semantic relevance weight of the word is W=lP, wherein the W is a weight, the P is a rate of the first word under the condition of occurrence of the second word; and a generating module is used to obtain each of the statistical samples The words are generated relative to the other words 5 my semantic relevance weights. Wherein, when the length of the weighted word is the minimum sliding window weighting length, the matching module comprises: -12- 201131395 weight minimum 値 acquisition module, which is used to respectively take each word segment of the target string in each part of the query string The weight of each participle of the query string is the smallest in the weight of each word segment of the target string; the first computing module is configured to calculate the minimum slip according to the minimum weight of each target string. The window weighting length; and the sorting module is used to compare the minimum sliding window weighting length of each target string, and the length is small, the sorting is first, and vice versa, the sorting is followed. The application also provides an apparatus for sorting search results, including a 'word weight table acquisition module, for calculating a semantic association weight between each two words in a statistical sample, to obtain and save a word weight table; a word acquisition module, configured to receive a query string input by the user terminal, perform a search according to the query string and obtain a target string; a word segmentation module, after the server obtains the query string and the target string, The query string and the target string are respectively segmented; the first weight minimum 値 calculation module is used to calculate the minimum weight of the inserted words relative to the query word segment; the second weight minimum 値 calculation module is used to calculate The deleted words have the smallest weight relative to each word segment of the target string; and the matching module 'is used to sort each target string according to the minimum weight of the weight and calculate the total editing distance of 5' and feed back to the user terminal. . The matching module includes:

第一總編輯距離計算模組，用以對各個目標字串’分別確定總的編輯距離，所述總的編輯距離爲_· WiS = W l +W D -13- 201131395 其中，Wm表示總的編輯距離，W,表示插入詞語相對查詢字串各分詞的權重最小値，WD表示刪除詞語相對目標字串各分詞的權重最小値；及排序模組，用以比較各目標字串的總的編輯距離’總的編輯距離小則排序在前，反之，排序在後》其中，所述裝置還包括：第三權重最小値計算模組，用以在計算總的編輯距離長度之前，獲取替換詞語的編輯距離的權重最小値；所述匹配模組包括：第二總編輯距離計算模組，用以對各個目標字串’分別確定總的編輯距離，所述總的編輯距離爲：W*8 zWf + Wo + wc 其中，w®表示總的編輯距離，W,表示插入詞語相對査詢字串各分詞的權重最小値，wD表示刪除詞語相對目標字串各分詞的權重最小値，Wc表示替換詞語相對查詢字串和/或目標字串各分詞的權重最小値；及排序模組，用以比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後。應用本申請案，相對於習知的簡單的詞語長度或距離的計算沒有考慮目標字串中的詞語跟査詢詞語的語義關聯程度，本申請案透過引入表示查詢字串和目標字串的語義關聯度的詞語權重，更準確地對目標字串進行排序，將與査詢字串語義相關的目標字串排在前面，反映出了各目標字串與查詢字串的匹配程度。在實際應用中應用簡單，且 -14- 201131395 效果好。【實施方式】下面將結合本申請案之實施例中的附圖，對本申請案之實施例中的技術方案進行清楚、完整地描述，顯然，所描述的實施例僅僅是本申請案的一部分實施例，而不是全部的實施例。基於本申請案中的實施例，本領域普通技術人員在沒有作出創造性勞動前提下所獲得的所有其他實施例，都屬於本申請案之保護的範圍。本申請案在計算詞語距離或詞語長度中加入了語義因素，考慮了查詢字串和目標字串之間的語義關聯，更佳地衡量了查詢字串和目標字串的匹配程度，使得搜索引擎中的搜索結果可以得到更合理的排名。當然，本申請案可以應用在任何計算字串匹配度的地方，並不局限於搜索引擎〇由於本申請案考慮的字串之間的語義，因而需要每兩個詞語之間的語義關聯權重，下面首先說明如何獲得每兩個詞語之間的語義關聯權重，以獲得詞語權重表，參見圖 1，具體包括如下步驟：步驟1 0 1，伺服器獲取統計樣本：該統計樣本的來源包括任何形式的文本或符號，其中，所述文本包括網頁文本、用戶搜索日誌、用戶點擊日誌等。通常來說，如果統計樣本中第一詞語和第二詞語共同出現的次數越多，說明第一詞語和第二詞語越相關。例如 -15- 201131395 ，在文本中“諾基亞”和“手機”經常共同出現，或者用戶經常搜索“諾基亞”然後點擊了帶有"手機”的結果，都能在某種程度表示“諾基亞”和“手機”高度相關，因而如果用戶搜索“諾基亞”時，結果中含有“手機”對我們來說不是個意外。步驟1 02，從統計樣本中選取第一詞語和第二詞語，統計所述第一詞語和第二詞語在統計樣本中共同出現的次數C (第一詞語，第二詞語）；例如，統計“手機”和“諾基亞”的共現次數C (手機，諾基亞）’並且於是可以得出，最後輸出所有詞語（在搜索每個詞語時）的權重。步驟1 03 ’統計第二詞語在統計樣本中出現的次數 (Yi ’第二詞語），其中’所述Yi代表每個跟第二詞語共同出現的詞語；例如’統計“諾基亞”和其他詞語共現的總次數即“ 諾基亞”的出現總次數）Σ C ( Y i，諾基亞），其中γ i代表每個跟"諾基亞”共現的詞語。步驟1 04 ’計算第一詞語在第二詞語出現條件下的槪率P (第一詞語丨第二詞語）=C (第一詞語，第二詞語） /EC ( Yi，第二詞語）：例如，可以得到“手機”在“諾基亞”出現條件下的槪率P(手機丨諾基亞）=C (手機，諾基亞）/£C(Yi，諾基亞）。步驟1 〇5，當查詢第二詞語時，取第一詞語與第二詞 -16- 201131395 語的語義相關權重爲w=i-p ;其中，W爲權重，P爲第一詞語在第二詞語出現條件下的槪率。例如，取W=l-P作爲查詢"諾基亞”時，“手機”和 “諾基亞”的語義相關權重。本例中權重採用的是1減去第一詞語在第二詞語出現下的條件槪率，在其他實施例中也可以採用其他方式表示權重，如直接用P作爲權重等等。步驟1 06，判斷統計樣本中是否所有詞語都處理完畢，是則執行步驟1 〇7，否則重複上述步驟，依次獲得所述統計樣本中每個詞語相對其他詞語的語義相關權重，步驟1 07，輸出包含統計樣本中每個詞語相對其他詞語的語義相關權重，以獲得到詞語權重表。例如，詞語權重表的其中一種可能的形式可以如表1 所示：表1 詞語1 詞語2 權重値第一詞語第二詞語 W12 第一詞語第三詞語 W13 第二詞語第三詞語 W23 第m詞語第η詞語 Wmn 需要說明的是’表1所示詞語權重表僅僅是一具體實施例’在實際應用中詞語權重表還可以有其他的表現形式 ’這裏，並不對詞語權重表的表現形式進行限定。 -17- 201131395 至此，獲得了詞語權重表，亦即獲得了在查詢第二詞語時第一詞語的權重。需要說明的是，詞語權重的獲取可以使用任何方式，圖1所示僅爲透過統計語言模型而獲得到的統計槪率一具體實施例而已，在實際應用中還可以採用其他方式獲取，如任何自動計算或人工設定的方式，在此，並不對獲取詞語權重表的方式進行限定。圖2是根據本申請案實施例的一種對搜索結果進行排序的方法流程圖，具體包括以下步驟：步驟201，伺服器獲得查詢字串和目標字串。其中，査詢字串通常是用戶輸入的，目標字串通常是伺服器經檢索後得到的與查詢字串相關的字串，例如，查詢字串亦即用戶輸入的是“諾基亞電池”，伺服器檢索後獲得到的目標字串是A “諾基亞電池” ，B “諾基亞手機，贈送電池”，C “諾基亞n73手機原裝電池”，則上述透過檢索而獲得到的A、B、C都是目標字串。本申請案實施例的目的就是判斷各目標字串（如檢索結構A、B、C)與查詢字串的匹配程度。也就是說，伺服器接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串。在本實施例中，以査詢字串爲“諾基亞電池"，目標字串爲C "諾基亞n73手機原裝電池”爲例進行說明。對於目標字串A “諾基亞電池”和B “諾基亞手機，贈送電池” 與目標字串C “諾基亞n73手機原裝電池”的處理過程基本相同，不再詳述。 -18- 201131395 步驟202，伺服器對所述查詢字串和目標字串分別進行分詞，獲得到構成查詢字串的分詞和構成目標字串的分詞。這裏，令查詢字串爲Q，目標字串爲T，對查詢字.串分詞後可得到QlQ2...Qm，對目標字串分詞後可得到 T1T2…Τη。在本實施例中，對查詢字串分詞以後得到：Q1Q2 =諾基亞|電池，對目標字串分詞後得到T1T2T3T4T5 =諾基亞|n73丨手機丨原裝丨電池。本申請案中的分詞可以是對字串任意方法的切分，可以分成語言意義上的詞，也可以是分成單字或字母、符號等等。步驟203，將査詢字串的各分詞依次與目標字串的分詞兩兩組合，獲得到多個由一個查詢字串分詞和一個目標字串分詞所構成的分片語合；具體上，獲得到（Ti，Ql) 、( Ti * Q2 ) ... ( Ti > Qm )° 本實施例中得到的分片語合包括：（T1，Q1 )、（ T1，Q2) 、（T2，Q1) 、（T2，Q2) 、（T3，Q1)、( T3，Q2) 、（T4，Qi) 、（T4，Q2) 、 ( T5 > Q1)、( T5 ， Q2)。步驟2〇4，查詢詞語權重表，獲得每個分片語合的權重値；這裏，令W表示權重，則根據權重表得到的每個分片 -19- 201131395 語合的權重値爲：W(T1，Q1) 、W(T1，Q2) 、W(T2 ，Q1) 、W(T2，Q2) 、W(T3，Q1) 、W(T3，Q2)、 W ( T4 > Q1) 、W(T4，Q2) 、W(T5，Q1) 、W(T5， Q2)。令 W(T1，Q1)=W1 W(T1，Q2)=W1， W ( T2，Ql) =W2 W ( T2 > Q2 ) =W25 W ( T3 > Q 1 ) ) =W3 W ( T3 > Q2 ) =W3 5 W ( T4 > Q1 ) =W4 W ( T4，Q2) =W4， W ( T5 > Q1 ) =W5 W(T5> Q2) =W55 其中，若Ti在Q中，則取Wi = 0，例如，T1爲諾基亞， Q1也爲諾基亞，則W(T1，Ql) =W1=0，同理，W(T5， Q2 ) =W5 ’=0 〇步驟205，根據所述權重値而獲得加權詞語長度；在本實施例中，加權詞語長度爲最小滑動視窗加權長度，此時，步驟205具體包括以下步驟： i )分別獲取目標字串的各個分詞與查詢字串各分詞的權重最小値：或者，分別獲取査詢字串的各個分詞與目標字串各分詞的權重最小値；由於獲取目標字串的各個分詞與查詢字串各分詞的權重最小値和獲取查詢字串的各個分詞與目標字串各分詞的權重最小値的處理過程非常相似 ’下面僅以獲取目標字串的各個分詞與查詢字串各分詞的權重最小値爲例進行說明。具體到上述實施例，亦即需要獲取T1相對Q1和Q2的兩個權重中的最小値，T2相對Q1和Q2的兩個權重中的最 -20- 201131395 小値....... 這裏，假設W(T1，Q1)和W(T1，Q2)的權重最小値爲Wl，W(T2，Q1)和W(T2，Q2)的權重最小値爲 W2，W(T3，Q1)和W(T3，Q2)的權重最小値爲W3’ W ( T4，Q1 )和W ( T4，Q2 )的權重最小値爲W4，W ( T5 ，Q1 )和W ( T5，Q2 )的權重最小値爲 W5’。 ii )對各個目標字串，根據所述權重最小値而分別計算最小滑動窗口加權長度；確定每個目標字串的最小滑動視窗加權長度具體包括最小滑動窗口加權長度^二咐总）， /=* ί=* 7=1 其中，W表示權重，Ti表示目標字串中的第i個的分詞，k、h分別表示目標字串最小滑動視窗的起始位置和結束位置，Qj表示查詢字串中的第j個分詞，m表示查詢字串分詞的個數。對於上述實施例，最小滑動窗口加權長度[Wi = Wl + W2 + W3 + W4 + W5 5 重複上述步驟202至205，可以得到查詢字串相對各個目標字串的最小滑動視窗加權長度。步驟206，根據所述加權詞語長度而確定查詢字串和目標字串的匹配程度，亦即根據所述加權詞語長度對每個目標字串進行排序，並反饋給用戶終端。具體上，比較各目標字串的最小滑動視窗加權長度，所述長度越小則匹配程度越高，反之，匹配程度越低，也 -21 - 201131395 即長度越小則排序越靠前，反之，排序越靠後。至此，確定了查詢字串與各目標字串之間的匹配程度。傳統的簡單的詞語長度的計算沒有考慮目標字串中的詞語跟查詢詞語的語義關聯程度，因而得到的詞語長度不能準確地反映查詢和目標的匹配程度。如“諾基亞電池”和“ 諾基亞n73手機原裝電池”，雖然長度差異很大，但是如果査詢詞語是“諾基亞電池”的情況下，兩者沒有很大實質區別。本申請案透過引入表示査詢字串和目標字串的語義關聯度的詞語權重，更準確地對目標字串進行排序，將與查詢字串語義相關的目標字串排在前面，反映出了各目標字串與查詢字串的匹配程度。在實際應用中應用簡單，且效果好。圖3是根據本申請案實施例的另一種對搜索結果進行排序的方法流程圖，本實施例基於編輯距離計算査詢字串和目標字串之間的差異，其中，編輯距離是指從一個字串變化到另一個字串最少需要的基本操作次數，或理解爲兩個字串差異部分的長度之和。通常的基本操作包括插入~ 個字/詞，刪除一個字/詞，替換一個字/詞，或者其他根據需要而設的操作。例如從"我愛你”變化到“我不愛她” 至少需要插入一個“不”、將“你”替換成“她”兩次基本操作，因此兩者的編輯距離爲2，同理，"隱形的翅膀 ”和“好吃的雞翅膀”編輯距離爲3。圖3所示流程具體上包括以下步驟’· 步驟3 0 1，伺服器獲得查詢字串和目標字串。 -22- 201131395 其中，査詢字串通常是用戶輸入的，目標字串通常是伺服器經檢索後得到的與查詢字串相關的字串。例如，査詢字串是“諾基亞手機電池”，目標字串是“原裝諾基亞手機電池”和“諾基亞手機，贈送電池”。也就是說，伺服器接收用戶終端輸入的查詢字串，根據査詢字串進行搜索並獲得目標字串。本申請案實施例的目的就是判斷各目標字串與查詢字串的匹配程度。在本實施例中，以查詢字串爲“諾基亞手機電池”，目標字串爲“原裝諾基亞手機電池”爲例進行說明。對於目標字串“諾基亞手機，贈送電池”，由於其與目標字串 “原裝諾基亞手機電池”的處理過程基本相同，不再詳述 0 步驟3 02，伺服器對所述查詢字串和目標字串分別進行分詞，得到構成査詢字串的分詞和構成目標字串的分詞〇這裏，令查詢字串爲Q，目標字串爲T，對查詢字串分詞後可得到QlQ2...Qm，對目標字串分詞後可得到 T1T2…Τη °在本實施例中，對查詢字串分詞以後得到：Q1Q2Q3 =諾基亞|手機丨電池，對目標字串分詞後得到T1T2T3 =原裝|諾基亞|電池。本申請案中的分詞可以是對字串任意方法的切分，可以分成語言意義上的詞，也可以是分成單字或字母、符號 -23- 201131395The first chief editor distance calculation module is configured to determine a total edit distance for each target string respectively, the total edit distance is _· WiS = W l + WD -13 - 201131395 where Wm represents the total edit The distance, W, indicates that the insertion word has the smallest weight relative to each word segment of the query string, WD indicates that the weight of the deleted word relative to the target word string has the smallest weight; and the sorting module is used to compare the total editing distance of each target string. 'The total editing distance is small, the sorting is first, and vice versa, the sorting is later. The apparatus further includes: a third weight minimum 値 computing module for obtaining the editing of the replacement words before calculating the total editing distance length The weight of the distance is the smallest; the matching module includes: a second total edit distance calculation module for determining a total edit distance for each target string respectively, the total edit distance is: W*8 zWf + Wo + wc where w® represents the total edit distance, W, which represents the minimum weight of each word segmentation of the inserted word relative to the query string, and wD represents the minimum weight of each word segmentation of the deleted word relative to the target string, c represents the minimum weight of the replacement word relative to the query string and/or the target word segment; and the sorting module is used to compare the total editing distance of each target string, and the total editing distance is small, and the order is first; , sorted after. Applying the present application, the semantic association of the words in the target string with the query term is not considered in relation to the conventional simple term length or distance calculation, and the present application introduces a semantic association representing the query string and the target string. The weight of the words is more accurately sorted by the target string, and the target string related to the semantics of the query string is ranked in front, reflecting the degree of matching between each target string and the query string. It is easy to apply in practical applications, and -14- 201131395 works well. The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the implementation of the present application. For example, not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are within the scope of the present invention. In this application, a semantic factor is added in calculating the word distance or the length of the word, considering the semantic association between the query string and the target string, and the degree of matching between the query string and the target string is better measured, so that the search engine Search results in can get a more reasonable ranking. Of course, the present application can be applied to any place where the string matching degree is calculated, and is not limited to the semantics of the search engine, because of the semantics between the strings considered in the present application, and thus the semantic association weight between each two words is required. First, how to obtain the semantic association weight between each two words to obtain the word weight table, as shown in Figure 1, specifically includes the following steps: Step 1 0, the server obtains a statistical sample: the source of the statistical sample includes any form Text or symbol, wherein the text includes webpage text, user search logs, user click logs, and the like. In general, the more the first word and the second word appear together in the statistical sample, the more relevant the first word and the second word are. For example, -15-201131395, in the text "Nokia" and "mobile phone" often appear together, or users often search for "Nokia" and then click on the results with "mobile phone, can all mean "Nokia" and to some extent “Mobile” is highly relevant, so if the user searches for “Nokia”, the result is “not a surprise” for us. Step 1 02, select the first word and the second word from the statistical sample, count the number The number C (first word, second word) in which a word and a second word appear together in a statistical sample; for example, the number of co-occurrences of "mobile phone" and "Nokia" C (mobile phone, Nokia)' and then available Out, finally the weight of all words (when searching for each word). Step 1 03 'Statistics the number of times the second word appears in the statistical sample (Yi 'second word), where 'Yi stands for each Words in which two words co-occur; for example, the total number of times that 'statistics Nokia' co-occurred with other words is the total number of occurrences of "Nokia") Σ C (Y i Nokia), where γ i represents each with " Nokia term co-occurrence. " Step 1 04 'Calculate the probability P of the first word under the condition of occurrence of the second word (first word 丨 second word) = C (first word, second word) /EC ( Yi, second word): for example You can get the "Mobile" in the "Nokia" situation under the rate P (mobile phone 丨 Nokia) = C (mobile phone, Nokia) / £ C (Yi, Nokia). Step 1 〇5, when querying the second word, take the semantic relevance weight of the first word and the second word-16-201131395 as w=ip; where W is the weight and P is the first word appearing in the second word The rate of exchange under conditions. For example, take W=lP as the semantic relevance weight of “mobile phone” and “Nokia” when querying “Kyoki.” In this example, the weight is 1 minus the conditional rate of the first word in the presence of the second word. In other embodiments, the weight may be expressed in other manners, such as directly using P as the weight, etc. Step 1 06, determining whether all the words in the statistical sample are processed, and then performing step 1 〇 7, otherwise repeating the above steps, Semantic correlation weights of each word in the statistical sample relative to other words are sequentially obtained, and in step 107, the semantic relevance weights of each word in the statistical sample relative to other words are output to obtain a word weight table. For example, the word weight table One of the possible forms can be as shown in Table 1: Table 1 Word 1 Word 2 Weight 値 First Word Second Word W12 First Word Third Word W13 Second Word Third Word W23 Word m Word η Word Wmn Need It is illustrated that the term weight table shown in Table 1 is merely a specific embodiment. In practical applications, the word weight table may have other The form of expression 'here, does not limit the expression of the word weight table. -17- 201131395 So far, the word weight table has been obtained, that is, the weight of the first word when querying the second word is obtained. The weight can be obtained in any way. Figure 1 shows only the statistical example obtained by the statistical language model. In the actual application, other methods can be used, such as any automatic calculation or manual setting. In this manner, the manner in which the word weight table is obtained is not limited. FIG. 2 is a flowchart of a method for sorting search results according to an embodiment of the present application, which specifically includes the following steps: Step 201: The server obtains a query word String and target string. The query string is usually input by the user. The target string is usually the string related to the query string obtained by the server after searching. For example, the query string is entered by the user. Nokia battery", the target string obtained after server retrieval is A "Nokia battery", B "Nokia ", the battery is given," "Nokia n73 mobile phone original battery", then A, B, C obtained through the search are all target strings. The purpose of the embodiment of the present application is to determine the target string (such as retrieval The degree of matching between the structure A, B, and C) and the query string. That is, the server receives the query string input by the user terminal, searches according to the query string, and obtains the target string. In this embodiment, The query string is "Nokia battery", the target string is C " Nokia n73 mobile phone original battery" as an example. For the target string A "Nokia battery" and B "Nokia mobile phone, free battery" and target string C "Nokia n73 mobile phone original battery" processing process is basically the same, no longer detailed. -18- 201131395 Step 202: The server separately segmentes the query string and the target string to obtain a word segment constituting the query string and a word segment constituting the target string. Here, the query string is Q, the target string is T, and QlQ2...Qm can be obtained for the query word. After the word segmentation, T1T2...Τn can be obtained after the target word segmentation. In this embodiment, after the word segmentation of the query string is obtained: Q1Q2 = Nokia | battery, after the target word segmentation, T1T2T3T4T5 = Nokia | n73 丨 mobile phone 丨 original 丨 battery. The word segmentation in the present application may be a segmentation of any method of the string, may be divided into words in the linguistic sense, or may be divided into single words or letters, symbols, and the like. Step 203: Combine each word segment of the query string with the word segmentation of the target string in sequence, and obtain a plurality of segmentation linguistics composed of one query string segmentation and one target string segmentation; specifically, obtain (Ti, Ql), (Ti * Q2 ) ... ( Ti > Qm ) ° The fragmentation linguistics obtained in this embodiment include: (T1, Q1), (T1, Q2), (T2, Q1) , (T2, Q2), (T3, Q1), (T3, Q2), (T4, Qi), (T4, Q2), (T5 > Q1), (T5, Q2). Step 2〇4, query the word weight table, and obtain the weight of each piece segmentation; here, let W denote the weight, then the weight of each piece -19-201131395 according to the weight table is: W (T1, Q1), W(T1, Q2), W(T2, Q1), W(T2, Q2), W(T3, Q1), W(T3, Q2), W (T4 > Q1), W (T4, Q2), W(T5, Q1), W(T5, Q2). Let W(T1,Q1)=W1 W(T1,Q2)=W1, W ( T2,Ql) =W2 W ( T2 > Q2 ) =W25 W ( T3 > Q 1 ) ) =W3 W ( T3 &gt ; Q2 ) = W3 5 W ( T4 > Q1 ) = W4 W ( T4 , Q2 ) = W4 , W ( T5 > Q1 ) = W5 W (T5 > Q2) = W55 where, if Ti is in Q, then Take Wi = 0, for example, T1 is Nokia, Q1 is also Nokia, then W(T1, Ql) = W1=0, similarly, W(T5, Q2) = W5 '=0 〇 Step 205, according to the weight In this embodiment, the length of the weighted word is the minimum sliding window weighting length. In this case, step 205 specifically includes the following steps: i) respectively acquiring each participle of the target string and each participle of the query string The weight is the smallest: or, the weight of each word segment and the target word segment of the query string is the smallest; the weight of each word segment and the query word segment of the target string is the smallest, and each of the query string is obtained. The processing of the minimum weight of each participle of the word segmentation and the target word string is very similar. 'The following is only the case where the weight of each word segment of the target word string and the word segment of the query string is the smallest. Described. Specifically, to the above embodiment, it is necessary to obtain the minimum 値 of the two weights of T1 with respect to Q1 and Q2, and T2 is the most -20-201131395 of the two weights of Q1 and Q2. Suppose W(T1, Q1) and W(T1, Q2) have the minimum weight 値 W1, W(T2, Q1) and W(T2, Q2) have the minimum weight 値 W2, W(T3, Q1) and W (T3, Q2) The minimum weight is W3' W (T4, Q1) and W (T4, Q2) have the minimum weight 値 W4, W (T5, Q1) and W (T5, Q2) have the minimum weight 値W5'. Ii) for each target string, respectively calculating a minimum sliding window weighting length according to the minimum weight; determining a minimum sliding window weighting length of each target string specifically including a minimum sliding window weighting length ^2咐 total), /= * ί=* 7=1 where W is the weight, Ti is the i-th part of the target string, k and h are the start and end positions of the minimum sliding window of the target string, and Qj is the query string. The jth participle in m, m represents the number of query word segmentation. For the above embodiment, the minimum sliding window weighting length [Wi = Wl + W2 + W3 + W4 + W5 5 Repeating the above steps 202 to 205, the minimum sliding window weighting length of the query string with respect to each target string can be obtained. Step 206: Determine a matching degree of the query string and the target string according to the weighted word length, that is, sort each target string according to the weighted word length, and feed back to the user terminal. Specifically, comparing the minimum sliding window weighting length of each target string, the smaller the length, the higher the matching degree, and vice versa, the lower the matching degree, and the shorter the length, the higher the ranking, and vice versa. The lower the sort. So far, the degree of matching between the query string and each target string is determined. The traditional simple term length calculation does not consider the semantic relevance of the words in the target string to the query term, so the resulting word length does not accurately reflect the degree of matching between the query and the target. Such as "Nokia battery" and "Nokia n73 mobile phone original battery", although the length varies greatly, but if the query word is "Nokia battery", the two are not very different. By introducing word weights indicating the semantic relevance of the query string and the target string, the present application more accurately sorts the target string, and ranks the target string related to the query string semantics in front, reflecting each The degree to which the target string matches the query string. It is simple to apply and effective in practical applications. FIG. 3 is a flowchart of another method for sorting search results according to an embodiment of the present application. The present embodiment calculates a difference between a query string and a target string based on an edit distance, wherein the edit distance refers to a word from one word. The minimum number of basic operations required to change a string to another string, or the sum of the lengths of the difference between the two strings. Common basic operations include inserting ~ words/words, deleting a word/word, replacing a word/word, or other operations as needed. For example, changing from "I love you" to "I don't love her" requires at least one basic operation of "no" and "you" to "her", so the editing distance between the two is 2, the same reason, The "invisible wings" and "good chicken wings" edit distance is 3. The flow shown in Figure 3 specifically includes the following steps '· Step 3 0 1. The server obtains the query string and the target string. -22- 201131395 Wherein, the query string is usually input by the user, and the target string is usually a string related to the query string obtained by the server after being retrieved. For example, the query string is “Nokia Mobile Phone Battery” and the target string is “Original Nokia Mobile Phone Battery” and “Nokia Mobile Phone, Free Battery”. That is, the servo receives the query string input by the user terminal, searches according to the query string, and obtains the target string. The purpose of the embodiment of the present application is to determine the degree of matching between each target string and the query string. In this embodiment, the query string is “Nokia mobile phone battery”, and the target string is “original Nokia mobile phone battery” as an example for description. For the target string "Nokia mobile phone, free battery", since it is basically the same as the target string "original Nokia mobile phone battery", no more details 0 step 3 02, the server pairs the query string and the target word The string is separately segmented to obtain the word segmentation constituting the query string and the segmentation word constituting the target string. Here, the query string is Q, the target string is T, and QlQ2...Qm can be obtained after segmentation of the query string. After the target word segmentation, T1T2...Τη ° can be obtained. In this embodiment, the word segmentation is obtained after the word segmentation: Q1Q2Q3 = Nokia | mobile phone battery, after the target word segmentation, T1T2T3 = original | Nokia | battery. The word segmentation in this application may be a segmentation of any method of the string, which may be divided into words in the sense of language, or may be divided into words or letters, symbols -23-201131395

Zrv*r 券寺0 步驟3 03，伺服器根據所述詞語權重表，計算插入的詞語相對査詢字串各分詞的權重最小値；具體上，根據詞語權重表，獲得插入的詞語相對査詢字串各分詞的權重値，在本例中，插入了“原裝” 一詞，令插入的詞爲I，則可以得到插入的詞語相對査詢字串各分詞的權重値：W(I1，Q1) 、W(I1，Q2) 、W(I1， Q3 )；計算插入的詞語相對査詢字串各分詞的權重最小値爲 n' n m Σμ =Zminw((，2y) Μ ί=1 j=\ 其中，W表示權重，It表示插入字串中的第t個的分詞 ’ η分別表示插入分詞的個數，Qj表示査詢字串中的第j個分詞’ m表示査詢字串分詞的個數。步驟3 04，根據詞語權重表，計算刪除的詞語相對目標字串各分詞的權重最小値；具體的，根據詞語權重表，獲得刪除的詞語相對目標字串各分詞的權重値，在本例中，刪除了“手機” 一詞，令刪除的詞爲D，則可以得到刪除的詞語相對目標字串各分詞的權重値：W(D1，T1) 、W(D1，T2) 、W(D1， T3 )；計算刪除的詞語相對査詢字串各分詞的權重最小値爲 = iminw(TnDd) 和丨 ι=1 其中，W表示權重，Ti表示目標字串中的第i個的分詞 -24- 201131395 ，q表示目標字串分詞的個數’ Dd表示刪除詞語中的第d個分詞，P表示刪除分詞的個數。步驟3 0 5，根據所述權重最小値計算總的編輯距離’ 確定查詢字串和目標字串的匹配程度’亦即根據所述總的編輯距離對每個目標字串進行排序，並反饋給用戶終端。具體上，對各個目標字串’分別確定總的編輯距離’ 對於一個目標字串的總編輯距離爲： W 總=W I + W d 其中，W⑸表示總的編輯距離’ Wi表示插入詞語相對査詢字串各分詞的權重最小値’ wd表示刪除詞語相對目標字串各分詞的權重最小値；比較各目標字串的總的編輯距離’所述總的編輯距離越小則匹配程度越高，反之，匹配程度越低，也即總的編輯距離越小則排序越靠前，反之’排序越靠後。至此，確定了查詢字串與各目標字串的匹配程度。傳統的簡單的詞語距離的計算沒有考慮目標字串中的詞語跟查詢詞語的語義關聯程度，因而得到的詞語距離不能準確地反映查詢和目標的匹配程度。本申請案透過引入表示查詢字串和目標字串的語義關聯度的詞語權重，更準確地對目標字串進行排序，將與查詢字串語義相關的目標字串排在前面，反映出了各目標字串與査詢字串的匹配程度。在實際應用中應用簡單，且效果好。需要說明的是’對於圖3所示實施例，還存在詞語替換的情況，例如將“我和你”變爲“我和他”時’其中的 -25- 201131395 “你”可認爲是被“他”替換，這裏，對詞語替換的情況可以做如下處理：方式一：將替換操作認爲是增加和刪除操作的組合，亦即認爲替換操作是不存在的，例如，將“我和你”變爲 “我和他”時，認爲是刪除了 “你”，增加了 “他”，亦即所有的變換都是插入和刪除操作，因而，應用圖3所示實施例可以很好的解決。方式二，將替換操作視爲除了插入和刪除之外的第三種操作，例如，將“我和你”變爲“我和他”時，認爲是將“你”替換爲“他”，此時，需要計算替換詞語的編輯距離的權重最小値，具體可以有兩種計算方法： a) 替換詞語的編輯距離的權重最小値等於預設的固定値，如，令替換詞語的編輯距離的權重最小値固定的等於1 ;或者， b) 令替換詞語的編輯距離等於插入詞語相對査詢字串各分詞的權最小重値與刪除詞語相對目標字串各分詞的權重最小値之和，或者，令替換詞語的編輯距離等於插入詞語相對查詢字字串各分詞的權重最小値與刪除詞語相對目標字串各分詞的權重最小値之和的平均値，或者，令替換詞語的編輯距離等於插入詞語相對查詢字串各分詞的權重最小値與刪除詞語相對目標字串各分詞的權重最小値兩種中的最大値，或其他任意組合形式。例如，替換詞語“他”的編輯距離=插入的“他”相對查詢字串“我和你”的各分詞的權重最小値+刪除詞語 -26- 201131395 “你”相對目標字串“我和他”各分詞的權重最小値；或者，替換詞語“他”的編輯距離=(插入的“他”相對查詢字串“我和你”的各分詞的權重最小値+刪除詞語“你 ”相對目標字串“我和他”各分詞的權重最小値）/2。等等。在方式二的情況下，步驟3 05具體包括：對各個目標字串，分別確定總的編輯距離，所述總的編輯距離爲： W«g = Wi + W〇 + Wc 其中，W®表示總的編輯距離，表示插入詞語相對查詢字串各分詞的權重最小値，wD表示刪除詞語相對目標字串各分詞的權重最小値，Wc表示替換詞語相對查詢字串和/或目標字串各分詞的權重最小値；比較各目標字串的總的編輯距離，所述總的編輯距離越小則匹配程度越高，反之，匹配程度越低，也即總的編輯距離越小則排序越靠前，反之，排序越靠後。需要說明的是，可以交錯地根據查詢字串和目標字串計算權重，如圖3所示實施例中，對於插入的字串，根據查詢字串計算權重，對於刪除的字串，根據目標字串計算權重。需要說明的是，對於圖2和圖3所示實施例，分詞可以是對字串任意方法的切分，可以分成語言意義上的詞，也可以是分成單字或字母、符號。 -27- 201131395 需要說明的是，對於圖2和圖3所示實施例，可以對權重進行任何形式的計算或變換，比如取對數等；也可以取目標詞語對各個查詢詞語的權重的最大値、平均値或其他形式的運算作爲該詞的權重（加權長度）。需要說明的是，對於圖2和圖3所示實施例，可以反過來將目標字串作爲査詢字串，將査詢字串作爲目標字串，不會産生本質區別。需要說明的是，對於圖2和圖3所示實施例，詞語距離或長度的計算區間可以是整個字串或根據演算法選定的任意區間，如選定某字串中跟另一個字串差異的部分。需要說明的是，匹配方法不一定要使用最小滑動窗口或編輯距離，可以是任何關於加權詞語距離或詞語長度的計算。需要說明的是，本申請案並不局限應用於檢索系統如搜索引擎，也可以應用於任何計算兩個字串匹配程度的系統。本申請案還揭示了一種對搜索結果進行排序的裝置，參見圖4，具體包括：詞語權重表獲取模組40 1，用以計算統計樣本中每兩個詞語之間的語義關聯權重，獲得並保存詞語權重表；詞獲取模組402，用以接收用戶終端輸入的査詢字串，根據查詢字串而進行搜索並獲得目標字串；分詞模組403，用以在伺服器獲得查詢字串和目標字串後，對所述査詢字串和目標字串分別進行分詞； -28- 201131395 組合模組404，用以將查詢字串的各分詞依次與目標字串的分詞兩兩組合；查詢模組4 0 5，用以查詢所述詞語權重表，獲得每個分片語合的權重値；匹配模組406，用以根據所述權重値獲得加權詞語長度，對每個目標字串進行排序，並反饋給用戶終端。上述詞語權重表獲取模組401可以具體包括：樣本獲取模組，用以獲取統計樣本；第一統計模組，用以從所述統計樣本中選取第〜詞語和第二詞語，統計所述第一詞語和第二詞語在統計樣本中共同出現的次數C (第一詞語，第二詞語）第二統計模組，用以統計第二詞語在統計樣本中出現的次數( Yi ’第二詞語），其中，所述Yi代表每個跟第二詞語共同出現的詞語；槪率計算模組，用以計算所述第一詞語在第二詞語出現條件下的槪率P (第一詞語丨第二詞語）=C (第一詞語，第二詞語）/EC ( Yi，第二詞語）權重計算模組，用以在査詢第二詞語時，取第一詞語與第二詞語的語義相關權重爲W=l-P，其中，所述w爲權重，所述P爲第一詞語在第二詞語出現條件下的槪率；產生模組，用以獲得所述統計樣本中每個詞語相對其他詞語的語義相關權重後，產生詞語權重表。當所述加權詞語長度爲最小滑動視窗加權長度時，上述匹配模組405可以具體包括： -29- 201131395 權重最小値獲取模組，用以分別取目標字串的各個分詞在查詢字串各分詞的權重最小値；或者，分別取查詢字串的各個分詞在目標字串各分詞的權重最小値；第一計算模組，用以對各個目標字串，根據所述權重最小値分別計算最小滑動窗口加權長度；排序模組，用以比較各目標字串的最小滑動視窗加權長度，長度小則排序在前，反之，排序在後，也即長度越小時判定匹配程度越高，反之，判定匹配程度越低。應用圖4所示實施例，透過引入表示查詢字串和目標字串的語義關聯度的詞語權重，更準確地反映出了各目標字串與査詢字串的匹配程度。在實際應用中應用簡單，且效果好。本申請案實施例還提供了一種對搜索結果進行排序的裝置，參見圖5，包括：詞語權重表獲取模組501，用以計算統計樣本中每兩個詞語之間的語義關聯權重，獲得並保存詞語權重表；詞獲取模組5 02，用以接收用戶終端輸入的查詢字串，根據查詢字串進行搜索並獲得目標字串；分詞模組5 03，用以在伺服器獲得查詢字串和目標字串後，對所述査詢字串和目標字串分別進行分詞；第一權重最小値計算模組5 04，用以計算插入的詞語相對查詢字串各分詞的權重最小値；第二權重最小値計算模組5 05，用以計算刪除的詞語相對目標字串各分詞的權重最小値； -30- 201131395 匹配模組5 06，用以根據所述權重最小値計算總的編輯距離，對每個目標字串進行排序，並反饋給用戶終端。上述匹配模組506可以具體包括：第一總編輯距離計算模組，用以對各個目標字串，分別確定總的編輯距離，所述總的編輯距離爲：Wa 其中，We表示總的編輯距離，W!表示插入詞語相對查詢字串各分詞的權重最小値，WD表示刪除詞語相對目標字串各分詞的權重最小値；排序模組，用以比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後，也即總的編輯距離越小時判定匹配程度越高，反之，判定匹配程度越低。圖5所述裝置還可以包括：第三權重最小値計算模組，用以在計算總的編輯距離長度之前，獲取替換詞語的編輯距離的權重最小値；此時，上述匹配模組505可以具體包括：Zrv*r voucher temple 0 step 3 03, the server calculates, according to the word weight table, the weight of each of the inserted words relative to the word segmentation of the query string is minimal; specifically, according to the word weight table, the inserted word relative query string is obtained according to the word weight table The weight of each participle is 値. In this example, the word “original” is inserted, so that the inserted word is I, then the weight of the inserted word relative to each part of the query string can be obtained: W(I1, Q1), W (I1, Q2), W(I1, Q3); Calculate the minimum weight of the inserted words relative to each part of the query string 値 is n' nm Σμ = Zminw((, 2y) Μ ί=1 j=\ where W is Weight, It means that the t-th participle 'η in the inserted string indicates the number of inserted participles, and Qj indicates the j-th participle 'm in the query string indicates the number of the query-string participle. Step 3 04, According to the word weight table, the weight of each word segmentation of the deleted word relative to the target word string is calculated to be the smallest 具体; specifically, according to the word weight table, the weight of each word segmentation of the deleted word relative to the target word string is obtained, in this example, the “deletion” is deleted. The word "mobile phone", the word deleted is D Then, the weights of the deleted words relative to the target words of the target string can be obtained: W(D1, T1), W(D1, T2), W(D1, T3); the weight of each word of the deleted word relative to the query string is calculated to be the smallest.値 = iminw(TnDd) and 丨ι=1 where W is the weight, Ti is the ith part of the target string -24-201131395, q is the number of the target word segmentation' Dd means deleting the word The dth participle, P represents the number of deleted participles. Step 3 0 5, according to the minimum weight 値 calculate the total edit distance 'determine the matching degree of the query string and the target string', that is, according to the total The edit distance sorts each target string and feeds back to the user terminal. Specifically, the total edit distance is determined for each target string separately. The total edit distance for a target string is: W total = WI + W d where W(5) indicates the total edit distance 'Wi indicates that the weight of the inserted words relative to each part of the query string is the smallest 値' wd indicates that the weight of the deleted words relative to the target word is the smallest 値; the total edit distance of each target string is compared '所The smaller the total editing distance is, the higher the matching degree is. On the contrary, the lower the matching degree, that is, the smaller the total editing distance is, the higher the ranking is. The more the sorting is, the more backward. So far, the query string and each target are determined. The degree of matching of the string. The traditional simple word distance calculation does not consider the semantic relevance of the words in the target string and the query words, so the resulting word distance cannot accurately reflect the matching degree between the query and the target. The word weights representing the semantic relevance of the query string and the target string are introduced, and the target string is sorted more accurately, and the target string related to the semantics of the query string is ranked in front, reflecting the target string and The degree to which the query string matches. In practical applications, the application is simple and the effect is good. It should be noted that, for the embodiment shown in FIG. 3, there is still a case of word substitution, for example, when "I and you" are changed to "I and him", the -25-201131395 "you" can be considered as being "He" replacement, here, the word substitution can be handled as follows: Method 1: The replacement operation is considered as a combination of addition and deletion operations, that is, the replacement operation does not exist, for example, "I and When you "become "I and him", you think that you deleted "you" and added "he", that is, all transformations are insert and delete operations. Therefore, applying the embodiment shown in Figure 3 can be very good. Solution. In the second way, the replacement operation is regarded as a third operation other than insertion and deletion. For example, when "me and you" is changed to "me and him", it is considered that "you" is replaced with "he". At this time, it is necessary to calculate the minimum weight of the edit distance of the replacement word. Specifically, there are two calculation methods: a) The minimum weight of the edit distance of the replacement word is equal to the preset fixed value, for example, the edit distance of the replacement word is The weight is minimum 値 fixed equal to 1; or, b) the editing distance of the replacement word is equal to the sum of the minimum weight of the inserted words relative to each part of the query string and the minimum weight of each word of the deleted words relative to the target string, or The edit distance of the replacement word is equal to the average weight of the minimum weight of each word segmentation of the inserted word relative to the query word string and the minimum weight of each word segment of the deleted word relative target word string, or the editing distance of the replacement word is equal to the inserted word The weight of each participle of the relative query string is the smallest, and the weight of each participle of the deleted target relative target string is the smallest, the largest of the two, or other Italian combination. For example, the edit distance of the word "he" is replaced = the value of the "he" of the inserted query string "I and you" is the smallest 値 + delete word -26- 201131395 "You" relative target string "I and him "The weight of each participle is the smallest; or, the edit distance of the word "he" is replaced = (the inserted "he" relative query string "I and you" has the least weight of each participle + delete the word "you" relative target word The weights of the words "me and him" are the smallest 値)/2. and many more. In the case of the second mode, the step 3 05 specifically includes: determining, for each target string, a total edit distance, where the total edit distance is: W«g = Wi + W〇+ Wc where W® represents total The edit distance indicates that the weight of each word segmentation of the inserted word relative to the query string is the smallest, wD indicates that the weight of each word segmentation of the deleted word relative to the target word string is the smallest, and Wc represents the word segmentation of the replacement word relative to the query word string and/or the target word string. The weight is the smallest; the total editing distance of each target string is compared, and the smaller the total editing distance is, the higher the matching degree is. On the contrary, the lower the matching degree, that is, the smaller the total editing distance is, the higher the ranking is. Conversely, the lower the ordering. It should be noted that the weights may be calculated according to the query string and the target string in an interleaved manner. In the embodiment shown in FIG. 3, for the inserted string, the weight is calculated according to the query string, and for the deleted string, according to the target word. The string calculates the weight. It should be noted that, for the embodiment shown in Fig. 2 and Fig. 3, the word segmentation may be a segmentation of any method of the string, and may be divided into words in the language sense, or may be divided into single words or letters and symbols. -27- 201131395 It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, any form of calculation or transformation of the weight may be performed, such as taking a logarithm, etc., and the weight of the target word for each query term may also be taken as the maximum value. , average 値 or other form of operation as the weight of the word (weighted length). It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, the target string can be used as the query string and the query string as the target string, and no essential difference is generated. It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, the calculation interval of the word distance or length may be the entire string or any interval selected according to the algorithm, such as selecting a difference between another string and another string. section. It should be noted that the matching method does not have to use the minimum sliding window or the editing distance, and can be any calculation about the weighted word distance or the word length. It should be noted that the present application is not limited to a retrieval system such as a search engine, and can be applied to any system that calculates the degree of matching of two strings. The application also discloses an apparatus for sorting search results. Referring to FIG. 4, the method further includes: a word weight table obtaining module 401, configured to calculate a semantic association weight between each two words in the statistical sample, and obtain The word acquisition module 402 is configured to receive a query string input by the user terminal, perform a search according to the query string, and obtain a target string; the word segmentation module 403 is configured to obtain a query string on the server and After the target string, the query string and the target string are respectively segmented; -28- 201131395 The combination module 404 is used to combine the word segments of the query string with the word segmentation of the target string in sequence; a group 405 for querying the word weight table to obtain a weight 每个 for each shard; the matching module 406 is configured to obtain a weighted word length according to the weight ,, and sort each target string And feedback to the user terminal. The above-mentioned word weight table obtaining module 401 may specifically include: a sample obtaining module for acquiring a statistical sample; a first statistical module, configured to select a first word and a second word from the statistical sample, and the statistical The number of times a word and the second word co-occur in the statistical sample C (first word, second word) The second statistical module is used to count the number of occurrences of the second word in the statistical sample ( Yi 'second word) Wherein, Yi represents each word that appears together with the second word; a rate calculation module is configured to calculate a probability P of the first word under the condition of occurrence of the second word (first word second Word) = C (first word, second word) / EC (Yi, second word) weight calculation module, when the second word is queried, the semantic weight of the first word and the second word is W = lP, wherein the w is a weight, the P is a rate of the first word under the condition of occurrence of the second word; generating a module for obtaining semantic correlation of each word in the statistical sample with respect to other words Generate words after weighting Heavy table. When the weighted word length is the minimum sliding window weighted length, the matching module 405 may specifically include: -29- 201131395 The weight minimum 値 acquisition module is used to respectively take each word segment of the target string in the query string. The weight of each participle of the query string is the smallest in the weight of each word segment of the target string; the first computing module is configured to calculate the minimum slip according to the minimum weight of each target string. Window weighting length; sorting module for comparing the minimum sliding window weighting length of each target string, the length is small, the sorting is first, and vice versa, after sorting, that is, the smaller the length, the higher the matching degree is determined, otherwise, the matching is determined. The lower the degree. Applying the embodiment shown in Fig. 4, the degree of matching between each target string and the query string is more accurately reflected by introducing word weights indicating the semantic relevance of the query string and the target string. In practical applications, the application is simple and the effect is good. The embodiment of the present application further provides an apparatus for sorting search results. Referring to FIG. 5, the method includes: a word weight table obtaining module 501, configured to calculate a semantic association weight between each two words in a statistical sample, and obtain and The word acquisition module 502 is configured to receive the query string input by the user terminal, search according to the query string and obtain the target string; the word segmentation module 5 03 is configured to obtain the query string on the server. After the target string, the query string and the target string are respectively segmented; the first weight minimum 値 calculation module 504 is configured to calculate the minimum weight of each of the inserted words relative to the query word segment; The minimum weight 値 calculation module 505 is configured to calculate the minimum weight of each of the deleted words relative to the target word segment; -30- 201131395 matching module 506, for calculating the total editing distance according to the minimum weight ,, Each target string is sorted and fed back to the user terminal. The matching module 506 may specifically include: a first total editing distance calculation module, configured to determine a total editing distance for each target string, where the total editing distance is: Wa, where We represents the total editing distance. , W! indicates that the weight of each word segmentation of the inserted word relative to the query string is the smallest, WD indicates that the weight of each word segmentation of the deleted word relative to the target string is the smallest; and the sorting module is used to compare the total editing distance of each target string, If the editing distance is small, the ranking is first. Otherwise, the sorting is after, that is, the smaller the total editing distance is, the higher the matching degree is. Otherwise, the lower the matching degree is. The device of FIG. 5 may further include: a third weight minimum 値 calculation module, configured to obtain a minimum weight 编辑 of the edit distance of the replacement word before calculating the total edit distance length; at this time, the matching module 505 may be specific include:

第二總編輯距離計算模組，用以對各個目標字串，分別確定總的編輯距離，所述總的編輯距離爲：WezWdWD + WC 其中，Wes表示總的編輯距離，W,表示插入詞語相對査詢字串各分詞的權重最小値，WD表示刪除詞語相對目標字串各分詞的權重最小値，Wc表示替換詞語相對查詢字串和/或目標字串各分詞的權重最小値；排序模組，用以比較各目標字串的總的編輯距離，總 -31 - 201131395 的編輯距離小則排序在前，反之，排序在後，也即總的編輯距離越小時判定匹配程度越高，反之，判定匹配程度越低。應用圖5所示裝置，透過引入表示査詢字串和目標字串的語義關聯度的詞語權重，更準確地反映出了各目標字串與查詢字串的匹配程度。在實際應用中應用簡單，且效果好。需要說明的是，爲了描述的方便，描述以上裝置時以功能分爲各種模組分別描述。當然，在實施本申請案時可以把各模組的功能在同一個或多個軟體和/或硬體中實現〇需要說明的是，對於系統實施例而言，由於其基本相似於方法實施例，所以描述的比較簡單，相關之處參見方法實施例的部分說明即可》需要說明的是，在本文中，諸如第一和第二等之類的關係術語僅僅用來將一個實體或者操作與另一個實體或操作區分開來，而不一定要求或者暗示這些實體或操作之間存在任何這種實際的關係或者順序。而且，術語“包括”、 “包含”或者其任何其他變體意在涵蓋非排他性的包含，從而使得包括一系列要素的過程、方法、物品或者設備不僅包括那些要素，而且還包括沒有明確列出的其他要素，或者是還包括爲這種過程、方法、物品或者設備所固有的要素。在沒有更多限制的情況下，由語句“包括—個......’，限定的要素，並不排除在包括所述要素的過程、方法、物品 -32- 201131395 或者設備中還存在另外的相同要素。透過以上的實施方式的描述可知，本領域的技術人員可以清楚地瞭解到本申請案可借助軟體加必需的通用硬體平臺的方式來實現。基於這樣的理解，本申請案的技術方案本質上或者說對現有技術做出貢獻的部分可以以軟體產品的形式體現出來，該電腦軟體産品可以儲存在儲存媒體中’如ROM/RAM、磁碟、光碟等，包括若干指令用以使得一台電腦設備（可以是個人電腦，伺服器，或者網路設備等）執行本申請案之各個實施例或者實施例的某些部分所述的方法。本申請案可用於許多通用或專用的計算系統環境或配置中。例如：個人電腦、伺服器電腦、手持設備或攜帶型設備 '平板型設備、多處理器系統、基於微處理器的系統、置頂盒 '可編程的消費電子設備、網路PC、小型電腦、大型電腦、包括以上任何系統或設備的分散式計算環境等等。本申請案可以在由電腦執行的電腦可執行指令的一般上下文中描述，例如程式模組。一般地說，程式模組包括執行特定任務或實現特定抽象資料類型的常式、程式、物件、元件、資料結構等等。也可以在分散式計算環境中實踐本申請案，在這些分散式計算環境中.，由透過通信網路而被連接的遠端處理設備來執行任務。在分散式計算環境中，程式模組可以位於包括儲存設備在內的本地和遠端電腦儲存媒體中。 -33- 201131395 以上所述僅爲本申請案的較佳實施例而已，並非用於限定本申請案的保護範圍。凡在本申請案的精神和原則之內所作的任何修改、等同替換、改進等，均包含在本申請案的保護範圍內。【圖式簡單說明】爲了更清楚地說明本申請案之實施例中的技術方案，下面將對實施例中所需要使用的附圖作簡單地介紹，顯而易見地，下面描述中的附圖僅僅是本申請案的一些實施例，對於本領域普通技術人員來講，在不付出創造性勞動的前提下，還可以根據這些附圖獲得其他的附圖。圖1是根據本申請案實施例的獲得詞語權重表的流程圖，圖2是根據本申請案實施例的一種對搜索結果進行排序的方法流程圖；圖3是根據本申請案實施例的另一種對搜索結果進行排序的方法流程圖；圖4是根據本申請案實施例的一種對搜索結果進行排序的裝置示意圖：圖5是根據本申請案實施例的另一種對搜索結果進行排序的裝置不意圖。【主要元件符號說明】 401 :詞語權重表獲取模組 -34- 201131395 402 :詞獲取模組 403 :分詞模組 404 :組合模組 4 0 5 :查詢模組 4 0 6 :匹配模組 501 :詞語權重表獲取模組 502 :詞獲取模組 5 03 :分詞模組 5 04 :第一權重最小値計算模組 5 05 :第二權重最小値計算模組 5 0 6 :匹配模組 -35-The second total edit distance calculation module is configured to determine a total edit distance for each target string, the total edit distance is: WezWdWD + WC, where Wes represents the total edit distance, and W represents the inserted words. The weight of each word segmentation of the query string is the smallest, WD indicates that the weight of each word segmentation of the deleted word relative to the target word string is the smallest, and Wc indicates that the weight of the replacement word relative to the query word string and/or the word segment of the target string is the smallest; To compare the total editing distance of each target string, the total editing distance of -31 - 201131395 is ranked first, and vice versa, that is, the smaller the total editing distance is, the higher the matching degree is. The lower the match. By applying the device shown in Fig. 5, the degree of matching between the target string and the query string is more accurately reflected by introducing the word weights indicating the semantic relevance of the query string and the target string. It is simple to apply and effective in practical applications. It should be noted that, for the convenience of description, the above devices are described by function into various modules separately. Of course, in the implementation of the present application, the functions of the modules can be implemented in the same software or software and/or hardware. It should be noted that, for the system embodiment, since it is basically similar to the method embodiment , so the description is relatively simple, see the partial description of the method embodiment for related reasons. It should be noted that in this paper, relational terms such as first and second are used only to refer to an entity or operation. Another entity or operation is distinct and does not necessarily require or imply any such actual relationship or order between the entities or operations. Furthermore, the terms "include", "comprise" or "comprising" or "comprising" or "comprising" or "the" Other elements, or elements that are inherent to such a process, method, item, or device. In the absence of more restrictions, the elements defined by the phrase "including -..." are not excluded from the process, method, or article -32-201131395 or equipment that includes the element. Further identical elements. It will be apparent to those skilled in the art from the above description of the embodiments that the present application can be implemented by means of a software plus a necessary general hardware platform. Based on this understanding, the present application The technical solution in essence or the contribution to the prior art can be embodied in the form of a software product that can be stored in a storage medium such as a ROM/RAM, a disk, a compact disk, etc., including a number of instructions. To enable a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present application or in certain portions of the embodiments. This application can be used in many general or special applications. Computing system environment or configuration. For example: personal computer, server computer, handheld device or portable device Multiprocessor systems, microprocessor-based systems, set-top boxes 'programmable consumer electronics, network PCs, small computers, large computers, decentralized computing environments including any of the above systems or devices, etc. This application can Described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, a program module includes routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Etc. The application can also be practiced in a decentralized computing environment in which tasks are performed by remote processing devices that are connected through a communication network. In a distributed computing environment, the programming model The group may be located in local and remote computer storage media including storage devices. -33- 201131395 The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of protection of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are included in the present application. BRIEF DESCRIPTION OF THE DRAWINGS [Brief Description of the Drawings] In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings to be used in the embodiments will be briefly described below, and obviously, in the following description The drawings are only some of the embodiments of the present application, and those skilled in the art can obtain other drawings according to the drawings without any creative work. Fig. 1 is implemented according to the application. FIG. 2 is a flowchart of a method for sorting search results according to an embodiment of the present application; FIG. 3 is another method for sorting search results according to an embodiment of the present application; FIG. 4 is a schematic diagram of an apparatus for sorting search results according to an embodiment of the present application: FIG. 5 is another apparatus for sorting search results according to an embodiment of the present application. [Main component symbol description] 401: Word weight table acquisition module - 34 - 201131395 402 : Word acquisition module 403 : Word segmentation module 404 : Combination module 4 0 5 : Query module 4 0 6 : Matching module 501: Word weight table acquisition module 502: word acquisition module 5 03 : word segmentation module 5 04 : first weight minimum 値 calculation module 5 05 : second weight minimum 値 calculation module 5 0 6 : matching module -35-

Claims

201131395 VII. Patent application scope: 1. A method for sorting search results, characterized in that the server pre-calculates the semantic association weight between each two words in the statistical sample to obtain and save the word weight table, the method The method further includes: the server receiving a query string input by the user terminal, searching according to the query string and obtaining a target string; the server separately segmenting the query string and the target string, the query word Each participle of the string is combined with the participle of the target string in turn » Query the word weight table to obtain the weight 每个 of each piece merging; and the weighted word length is obtained according to the weight ,, according to the length of the weighted word Each target string is sorted and fed back to the user terminal. 2. The method of claim 1, wherein the server pre-calculates a semantic association weight between each two words in the statistical sample, and the step of obtaining the word weight table comprises: the server obtaining the statistical sample Selecting the first word and the second word from the statistical sample to count the number C (first word, second word) in which the first word and the second word co-occur in the statistical sample; The number of occurrences of the word in the statistical sample ZC ( Yi, second word), wherein the Yi represents each word that appears together with the second word; calculating the first word in the condition of occurrence of the second word Rate P (first word 丨 second word) (first word, second word) / [C ( Yi -36- 201131395 , second word); when querying the second word 'take the first word and the The semantic relevance weight of the second word is W=lp, where W is a weight, and the P is a rate of the first word under the condition of occurrence of the second word; and the above steps are repeated, and each of the statistical samples is sequentially obtained. Words Semantically related to other words in the right weight to get the words weight table. 3. The method of claim 2, wherein the source of the statistical sample comprises any form of text or symbol including web page text, user search logs, and user click logs. 4. The method of claim 1, wherein the weighted word length is a minimum sliding window weighted length; and the step of obtaining the weighted word length according to the weight 对 to sort each target string comprises: Each participle of the target string has the smallest weight of each participle of the query string; or, each of the participles of the query string respectively has the smallest weight of each participle of the target string: for each target string, according to The minimum weight is calculated separately for the minimum sliding window weighted length; and the minimum sliding window weighting length of each target string is compared, and the length is small, the sort is first, and vice versa, the sorting is followed. 5. The method of claim 4, wherein calculating the minimum sliding window weighting length of each target string comprises: the minimum sliding window □ weighting length έβ, /=* i=k 卜' W denotes a weight, Ti denotes the i-th sub-37-201131395 word in the target string, k and h respectively represent the start position and end position of the minimum sliding window of the target string, and Qj represents the query string. The jth participle,! !! indicates the number of word segmentation of the query string. 6. A method for sorting search results, wherein the server pre-calculates a semantic association weight between each two words in the statistical sample to obtain and save a word weight table, the method further comprising: the server Receiving a query string input by the user terminal, searching according to the query string and obtaining a target string; the server separately segmenting the query string and the target string; the server is based on the stored word weight table, Calculating the minimum weight of the inserted words relative to each word segment of the query string; the server calculates, according to the word weight table, the weight of the deleted words relative to each of the target word segments; and calculating the total according to the minimum weight Editing distance, and sorting each target string according to the total editing distance, and feeding back to the user terminal. 7. The method of claim 6, wherein the step of calculating the minimum weight of the inserted words relative to the word segmentation of the query string according to the word weight table comprises: obtaining an insertion according to the word weight table The weight of the word relative to each participle of the query string; and the minimum weight of the inserted word relative to each part of the query string is =Zminw(A,A) /=1 /=1 j=\ -38- 201131395 where w represents the weight, the It table does not insert the t-th division of the sub-string, η denotes the number of inserted participles respectively, Qj denotes the j-th participle in the query string, and m denotes the query word The number of strings of words. 8. The method of claim 6, wherein the step of calculating the minimum weight of the deleted word relative to each of the target word segmentation according to the word weight table comprises: obtaining the deletion according to the word weight table The weight of the word relative to each participle of the target string; and the minimum weight of the word of the deleted word relative to the word segment of the target string is =Zminw(7I>jD</) d^l rf=l ί=1 W represents the weight, Ti represents the i-th part of the target string, q represents the number of the target word segmentation, Dd represents the d-th part of the deleted word, and p represents the number of deleted participles. 9. The method of claim 6, wherein the total edit distance is calculated according to the minimum weight, and the step of sorting each target string comprises: determining the total for each target string separately The editing distance of the total editing distance is: W total = WI + WD where W always indicates the total editing distance, hehe! Indicates that the insertion word phase # has the smallest weight of each word segment of the query string, Wd indicates that the weight of the deleted word is the smallest relative to each word segment of the target string; and the total editing distance of each target string is compared, the total editing The distance -39- 201131395 is sorted first, and vice versa. The method of claim 6, wherein before calculating the total edit distance length, the method further comprises: calculating a weight minimum of the edit distance of the replacement word; calculating the total according to the weight minimum 値The editing distance to determine the degree of matching between the query string and the target string includes: determining the total edit distance for each target string ', respectively, the total edit distance is: Ws = Wi + WD + Wc Where 'W® indicates the total edit distance, \^, indicating that the insertion word has the smallest weight relative to each participle of the query string, wD indicates that the weight of the deleted word is the smallest relative to each participle of the target string, and wc indicates the replacement The weight of the word relative to the query string and/or the word segment of the target string is the smallest; and the total edit distance of each target string is compared, and the total edit distance is smaller, and the order is earlier; 11. The method of claim 10, wherein the method of obtaining the minimum weight of the edit distance of the replacement term comprises: making an edit distance of the replacement term The weight minimum 値 is equal to the preset fixed 値, or the editing distance of the replacement word is equal to the minimum weight of the inserted word relative to each word segment of the query string, and the weight of the deleted word relative to the target word of the deleted word is the smallest 値The sum, or average, or the largest of the two. 12. Apparatus for sorting search results, characterized in that -40-201131395 comprises: a word weight table acquisition module for calculating a semantic association weight between each two words in a statistical sample, obtaining and saving words a weight obtaining table, configured to receive a query string input by the user terminal, perform a search according to the query string and obtain a target string; a word segmentation module, configured to obtain the query string and the target at the server After the string, the query string and the target string are respectively segmented; the combination module is configured to combine the word segments of the query string with the word segmentation of the target string in turn; the query module is used for The word weight table is queried, and the weight of each piece is obtained; and the matching module is configured to obtain the weighted word length according to the weight, sort each target string, and feed back to the user terminal. The apparatus of claim 12, wherein the word weight table acquisition module comprises: a sample acquisition module for acquiring the statistical sample; and a first statistical module for using the statistics Selecting a first word and a second word in the sample, and counting a number C (first word, second word) of the first word and the second word co-occurring in the statistical sample, the second statistical module is used to count the The number of occurrences of the second word in the statistical sample (Y, second word), wherein the Yi represents each word that appears together with the second word; the rate calculation module is configured to calculate the first word in The probability P (first word 丨 second word) = C (first word, -41 - 201131395 second word) / Σ ε ( Yi, second word) weight calculation module under the condition of occurrence of the second word When the second word is queried, the semantic relevance weight of the first word and the second word is w=1-P, wherein the W is a weight, and the P is the first word appears in the second word Rate of conditions: and the generation of modules for After each word relative to other words in the semantic relevance weights the statistical sample weight, the weight table to generate the word. The device of claim 12, wherein when the length of the weighted word is the minimum sliding window weighted length, the matching module comprises: a weight minimum acquisition module for respectively taking the target Each participle of the string has the smallest weight of each participle of the query string: or, each of the participle of the query string respectively has the smallest weight of each participle of the target string; the first computing module is used for each a target string, respectively calculating a minimum sliding window weighting length according to the minimum weight; and a sorting module for comparing the minimum sliding window weighting length of each target string, wherein the length is small and the sorting is first; Rear. A device for sorting search results, comprising: a word weight table obtaining module, configured to calculate a semantic association weight between each two words in the statistical sample, and obtain and save a word weight table a word acquisition module, configured to receive a query string input by the user terminal to search according to the query string and obtain a target string; a word segment module for obtaining the query string and the target word at the server - 42- 201131395 After the string, the query string and the target string are respectively segmented; the first weight minimum 値 calculation module 'is used to calculate the minimum weight of the inserted words relative to the word segment of the query string; second weight a minimum 値 calculation module for calculating a minimum weight of the deleted words relative to each of the target word segments; and a matching module 'for calculating a total editing distance according to the minimum weight ,, sorting each target string And feedback to the user terminal. The apparatus of claim 15, wherein the matching module comprises: a first total editing distance calculation module, configured to determine the total editing distance for each target string, The total edit distance is: Wa^WdWD, where W*s represents the total edit distance, indicating that the insertion word has the least weight relative to each participle of the query string, and WD indicates that the deleted word is relative to the target string. The weight of the word segmentation is the smallest; and the sorting module is used to compare the total editing distance of each target string, and the total editing distance is small, and the sorting is followed by the sorting. The device of claim 15 , wherein the device further comprises: a third weight minimum calculation module, configured to acquire an edit of the replacement word before calculating the total edit distance length The weight of the distance is the smallest; the matching module includes: a second total editing distance calculation module, configured to determine the total editing distance for each target string, the total editing distance is: Wm = W, + WD + W c -43- 201131395 where ' Wis indicates the total edit distance, comment: indicates that the insertion word has the least weight relative to each participle of the query string, and WD indicates that the weight of the deleted word is the smallest relative to each part of the target string.値, Wc represents the minimum weight of the replacement word relative to the query string and/or the word segmentation of the target string: and a sorting module for comparing the total edit distance of each target string, the total edit distance Small is sorted first, otherwise 'sorted behind. -44 -