TWI647580B - Search filtering method that enhances the matching of text search results - Google Patents

Search filtering method that enhances the matching of text search results Download PDF

Info

Publication number
TWI647580B
TWI647580B TW106118108A TW106118108A TWI647580B TW I647580 B TWI647580 B TW I647580B TW 106118108 A TW106118108 A TW 106118108A TW 106118108 A TW106118108 A TW 106118108A TW I647580 B TWI647580 B TW I647580B
Authority
TW
Taiwan
Prior art keywords
article
search
keyword
semantic database
search result
Prior art date
Application number
TW106118108A
Other languages
Chinese (zh)
Other versions
TW201903626A (en
Inventor
陳世興
Original Assignee
正修學校財團法人正修科技大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 正修學校財團法人正修科技大學 filed Critical 正修學校財團法人正修科技大學
Priority to TW106118108A priority Critical patent/TWI647580B/en
Application granted granted Critical
Publication of TWI647580B publication Critical patent/TWI647580B/en
Publication of TW201903626A publication Critical patent/TW201903626A/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本發明係有關於一種提升文圖搜尋結果相契合最佳化的搜尋過濾方法,係在一搜尋引擎輸入關鍵字,解析來源文章中是否包含該關鍵字,若無,捨棄該來源文章,若有,則保留成為初步檢索結果文章,接著解析該初步檢索結果文章中所夾帶的圖片的標籤中是否包含該關鍵字,若無,捨棄該初步檢索結果文章,若有,則保留成為最終檢索結果文章;如此,即可達到搜尋結果的文圖相契合最佳化之效果。 The present invention relates to a search filtering method that improves the matching of search results of text images. It inputs a keyword in a search engine and analyzes whether the source article contains the keyword. If not, the source article is discarded. , Then keep the article as the preliminary search result, and then analyze whether the label of the picture carried in the article of the preliminary search result contains the keyword, if not, discard the article of the preliminary search result, if there is, keep it as the article of the final search result ; In this way, you can achieve the best results of the search results and graphics.

Description

提升文圖搜尋結果相契合最佳化的搜尋過濾方法 Improve the search and filtering method to match the optimized search results

本發明係有關於一種搜尋過濾方法,尤其是指一種能提升搜尋結果的文章與圖片彼此間最佳化的相契合程度,使搜尋資料更為精準不失真的方法。 The present invention relates to a search filtering method, in particular to a method which can improve the degree of matching between the articles and pictures of search results and make the search data more accurate and not distorted.

在資訊科技的蓬勃發展下,資訊爆炸的現象與日俱增,同時也改變了人們查找資料、學習、互動…等關係。然而,在上千萬筆甚至更多的資料或網頁中,想要獲得符合人們所關心的議題的相關資料或網頁,皆是透過檢索的方式得到;而目前常見的檢索方法,皆是採用全文檢索,即輸入關鍵字或詞之後只要文章中包含所輸入的關鍵字或詞,搜尋引擎便會將該些文章以條目式依序排列顯示,以得到一檢索結果。 With the vigorous development of information technology, the phenomenon of information explosion is increasing day by day. At the same time, it has changed the relationship between people looking for information, learning, interacting ... and so on. However, in tens of millions of articles or even more materials or webpages, all the relevant materials or webpages that meet the issues people care about are obtained through retrieval; and the current common retrieval methods all use full text Retrieval, that is, as long as the entered keywords or words are included in the article after entering the keywords or words, the search engine will sequentially display the articles in an itemized manner to obtain a search result.

又,搜尋引擎回傳的檢索結果中雖有符合關鍵字或詞之資料,但不一定代表所回傳的檢索結果都是使用者所需要的資料,於是使用者必須再進一步於回傳的檢索結果中進行資料的篩選,不僅造成搜尋關心議題上的困擾,而且也相當耗時。 In addition, although the search results returned by the search engine have data that matches the keywords or words, it does not necessarily mean that the returned search results are all the data that the user needs, so the user must further search the returned search The screening of data in the results not only caused confusion on the topic of interest in searching, but also was quite time-consuming.

今,本發明人即是鑒於目前檢索資料的搜尋方法,無法提供最契合查詢者想要取得之資料,而產生搜尋時的困擾與過濾資料時間的耗費,而進一步研發出本發明。 Now, in view of the current search method for retrieving data, the present inventor cannot provide the data that best fits the queryer's desire to obtain troubles during the search and the time spent filtering data, and further developed the present invention.

本發明之主要目的,係為提供一種能提升搜尋結果的文章與圖片彼此間具有最佳化的相契合程度,使搜尋資料更為精準不失真的提升文圖搜尋結果相契合最佳化的搜尋過濾方法。 The main object of the present invention is to provide an optimized matching degree between articles and pictures that can improve the search results, and make the search data more accurate and distortion-free. Filtering method.

一種提升文圖搜尋結果相契合最佳化的搜尋過濾方法,係包括以下步驟:在一搜尋引擎輸入至少一關鍵字;解析來源文章中是否包含所述關鍵字,若無,捨棄所述來源文章,若有,則保留成為初步檢索結果文章;解析所述初步檢索結果文章中所夾帶的圖片的標籤中是否包含所述關鍵字,若無,捨棄所述初步檢索結果文章,若有,則保留成為最終檢索結果文章,並將結果於顯示裝置顯示。 A search filtering method for improving the matching of search results of text images includes the following steps: input at least one keyword in a search engine; analyze whether the source article contains the keyword, if not, discard the source article , If so, keep the article as the preliminary search result; analyze whether the label of the picture carried in the preliminary search result article contains the keyword, if not, discard the preliminary search result article, and if so, keep it It becomes the final search result article and displays the result on the display device.

如上所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,在解析所述初步檢索結果文章中所夾帶的圖片的標籤中是否包含所述關鍵字的過程中,是透過連結Google Cloud Vision API存取所述圖片中的標籤資訊。 As mentioned above, the search filtering method that improves the matching of search results of text images is optimized. In the process of parsing whether the keywords contained in the tags of the pictures carried in the articles of the preliminary search results include the keywords, the The Google Cloud Vision API accesses the tag information in the picture.

一種提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其步驟包括:建立一語意庫;在一搜尋引擎輸入至少一關鍵字;選擇一語意庫中與所述關鍵字所包含的所有同義字詞彙;將一來源資料庫中的文章與所述關鍵字所包含的所有同義字詞彙進行比對,解析所述文章中是否出現所述關鍵字所包含的所有同義字詞彙,若無,捨棄所述文章,若有,保留所述文章成為初步檢索結果文章;解析所述初步檢索結果文章中所夾帶的圖片的標籤中是否出現所述關鍵字所包含的所有同義字詞彙,若無,捨棄所述初步檢索結果文章,若有,保留所述初步檢索結果文章以成為最終檢索結果文章,並將所述最終檢索結果文章的標題於顯示裝置顯示。 A search filtering method for improving the matching of text search results with optimization includes the steps of: creating a semantic database; entering at least one keyword in a search engine; selecting all the synonyms in the semantic database that are included in the keywords Word vocabulary; compare the article in a source database with all the synonyms of the keyword included in the keyword, and analyze whether all the synonyms of the keyword included in the keyword appear in the article, if not, discard The article, if any, keep the article as the preliminary search result article; analyze whether all the synonyms of the keywords contained in the keyword appear in the label of the picture carried in the preliminary search result article, if not, discard it The preliminary search result article, if any, retains the preliminary search result article to become the final search result article, and displays the title of the final search result article on the display device.

如上所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,所述建立語意庫的步驟包括:匯入訓練圖庫的步驟、解 析所述圖庫中各圖片可能出現的標籤內容的步驟、收集所述可能出現的標籤內容並儲存於語意庫的步驟。 As mentioned above, the search filtering method for improving the search results of the text image is optimized, wherein the step of creating a semantic database includes the steps of importing a training library, and a solution. The steps of analyzing the tag content that may appear in each picture in the gallery, the step of collecting the tag content that may appear and storing in the semantic database.

如上所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,在所述解析所述圖庫中各圖片可能出現的標籤內容的步驟中,係先計算所述圖庫中的多張圖片總共產生k個標籤的適合度Sm,且m=1,…,k,之後由Sj的高低決定是否保留該標籤(Sj表示圖j的適合度);假設每張圖片取得q個標籤內容與其對應分數(q出現在j的範圍中),對應分數以Yij表示,其中i為圖片編號(i=1,…,n以及j=0,…,q),當j為0代表該圖片並未產出任何標籤,在計算時加總所有k個標籤的總分,如果出現次數多且品質分數高,所計算出的Sm就會比較高,因此該標籤代表這是可保留至語意庫,其運算公式如下: As mentioned above, the search filtering method that improves the search results of the text image is optimized, wherein, in the step of parsing the label content that may appear in each picture in the gallery, multiple pictures in the gallery are first calculated The picture produces a total of k labels with a suitability of S m , and m = 1, ..., k, and then the level of S j determines whether to retain the label (S j represents the suitability of figure j); suppose that each picture gets q The content of the label and its corresponding score (q appears in the range of j), the corresponding score is represented by Y ij , where i is the picture number (i = 1, ..., n and j = 0, ..., q), when j is 0 The image does not produce any tags. When calculating, add up the total score of all k tags. If there are many occurrences and the quality score is high, the calculated S m will be relatively high, so the tag represents that it can be retained To the semantic database, the calculation formula is as follows:

如上所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,所述建立語意庫的步驟係包括:輸入中文關鍵字的步驟、以及以一翻譯應用程式將所述輸入中文關鍵字轉譯為英文並儲存於語意庫的步驟。 As mentioned above, the search filtering method that improves the search results of the text image is optimized, wherein the step of creating a semantic database includes the steps of inputting Chinese keywords, and using a translation application to key the input Chinese The steps of translating characters into English and storing them in the semantic database.

如上所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,所述建立語意庫的步驟還包括一擴充語意庫的步驟,包 括透過既有語意庫尋找同義字的步驟、將所述同義字加入所述語意庫中儲存。 As mentioned above, the search filtering method that improves the search results of the text image matches the optimization, wherein the step of creating a semantic database further includes a step of expanding the semantic database, including It includes the steps of finding synonyms through the existing semantic database, and adding the synonyms to the semantic database for storage.

如上所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,在解析所述初步檢索結果文章中所夾帶的圖片的標籤中是否包含所述關鍵字的過程中,是透過連結Google Cloud Vision API存取所述圖片中的標籤資訊。 As mentioned above, the search filtering method that improves the matching of search results of text images is optimized. In the process of parsing whether the keywords contained in the tags of the pictures carried in the articles of the preliminary search results include the keywords, the The Google Cloud Vision API accesses the tag information in the picture.

(S11)‧‧‧步驟一 (S11) ‧‧‧Step 1

(S12)‧‧‧步驟二 (S12) ‧‧‧Step 2

(S13)‧‧‧步驟三 (S13) ‧‧‧Step 3

(S21)‧‧‧步驟一 (S21) ‧‧‧Step 1

(S22)‧‧‧步驟二 (S22) ‧‧‧Step 2

(S23)‧‧‧步驟三 (S23) ‧‧‧Step 3

(S31)‧‧‧步驟一 (S31) ‧‧‧Step 1

(S32)‧‧‧步驟二 (S32) ‧‧‧Step 2

(S33)‧‧‧步驟三 (S33) ‧‧‧Step 3

(S41)‧‧‧步驟一 (S41) ‧‧‧Step 1

(S42)‧‧‧步驟二 (S42) ‧‧‧Step 2

(S51)‧‧‧步驟一 (S51) ‧‧‧Step 1

(S52)‧‧‧步驟二 (S52) ‧‧‧Step 2

(S53)‧‧‧步驟三 (S53) ‧‧‧Step 3

(S54)‧‧‧步驟四 (S54) ‧‧‧Step 4

第一圖:由Google圖片搜尋引擎中輸入關鍵字「南台地震」搜尋所回傳的結果之一 The first picture: one of the results returned by the search for the keyword "Southern Taiwan Earthquake" in the Google image search engine

第二圖:本發明係關於一種提升文圖搜尋結果相契合最佳化的搜尋過濾方法的步驟流程圖 Second figure: The present invention relates to a flow chart of a search and filtering method for improving the matching of text search results

第三圖:第一種建立語意庫的步驟流程圖 Figure 3: The flow chart of the first step to build a semantic database

第四圖:第二種建立語意庫的步驟流程圖 Figure 4: Flow chart of the second step to build a semantic database

第五圖:將既有語意庫的資料進行擴充語意庫的步驟流程圖 Figure 5: Steps to expand the semantic database of the existing semantic database

第六圖:完成語意庫的建立之後,執行本發明係關於一種提升文圖搜尋結果相契合最佳化的搜尋過濾方法進行圖文檢索的步驟流程圖 Figure 6: After the establishment of the semantic database, the implementation of the present invention is a flow chart of the steps of the search and filtering method for improving the matching of the search results of the text and images to perform the text and text retrieval

第七圖:將一北極熊圖片上傳至Google Cloud Vision API辨識所回傳的標籤化運算結果 Figure 7: Upload a picture of a polar bear to the Google Cloud Vision API to identify the results of the tagged operation

第八圖:顯示輸入「南台地震」關鍵字並以本發明之方法進行檢索後所得其一圖片的標籤分析 Figure 8: Display the label analysis of one of the pictures after entering the keyword "Nantai Earthquake" and searching by the method of the present invention

第九圖:顯示輸入「南台地震」關鍵字並以本發明之方法進行檢索後所得其二圖片的標籤分析 The ninth figure: shows the label analysis of the two pictures obtained by entering the keyword "Nantai earthquake" and searching by the method of the present invention

第十圖:顯示輸入「南台地震」關鍵字並以本發明之方法進行檢索後所得其三圖片的標籤分析 Figure 10: Display the label analysis of the three pictures obtained by entering the keyword "Nantai Earthquake" and searching by the method of the present invention

第十一圖:顯示文章中所含圖片的標籤化資料中未符合輸入「南台地震」關鍵字的其一示意圖 Figure 11: A schematic diagram showing that the tagged data of the pictures contained in the article does not match the input of the keyword "Southern Taiwan Earthquake"

第十二圖:顯示文章中所含圖片的標籤化資料中未符合輸入「南台地震」關鍵字的其二示意圖 Figure 12: The second schematic diagram showing the tagging data of the pictures contained in the article that does not match the keyword of "Southern Taiwan Earthquake"

第十三圖:顯示文章中所含圖片的標籤化資料中未符合輸入「南台地震」關鍵字的其三示意圖 Figure 13: Three schematic diagrams showing that the tag data of the pictures contained in the article does not match the keyword of "Southern Taiwan Earthquake"

為令本發明所運用之技術內容、發明目的及其達成之功效有更完整且清楚的揭露,茲於下詳細說明之,並請一併參閱所揭之圖式及圖號: In order to provide a more complete and clear disclosure of the technical content, the purpose of the invention and the effect achieved by the present invention, the following is a detailed description, and please refer to the disclosed drawings and figures:

當以目前常見的搜尋引擎例如google、yahoo、Openfind、Bing...等進行資料搜尋時,係在該搜尋引擎中鍵入關鍵字,該些搜尋引擎根據關鍵字條列出相關的資訊;然而,回傳的資訊中雖皆包含有所輸入的關鍵字,而符合搜尋結果,但回傳的眾多資料中不代 表都會是搜尋者所需要的資料。舉例來說,當搜尋者希望透過在google搜尋引擎中輸入關鍵字「南台地震」,來蒐集南台灣地震救災的資料時,其搜尋結果除了出現救災與地震的文章之外,也會出現非相關的資料,請參看第一圖,即是由Google圖片搜尋引擎中輸入關鍵字「南台地震」搜尋所回傳的結果之一,由此範例可以看到此新聞的標題即包含了所輸入的關鍵字,但實際上卻與搜尋者欲關心的救災內容無直接的相關,此一搜尋落差的問題也常見於影像檢索中。 When searching for data with currently common search engines such as google, yahoo, Openfind, Bing ... etc., keywords are typed in the search engine, and these search engines list relevant information based on the keyword bar; however, Although the returned information contains the entered keywords and matches the search results, it does not replace the many data returned The table will be the information that the searcher needs. For example, when a searcher wants to collect data on earthquake relief in southern Taiwan by entering the keyword "Southern Taiwan Earthquake" in the Google search engine, the search results will also appear non-related in addition to articles on disaster relief and earthquakes. For the data, please refer to the first picture, which is one of the results returned by the search for the keyword "Nantai Earthquake" in the Google image search engine. From this example, you can see that the title of this news contains the key entered Word, but it is actually not directly related to the content of the disaster relief that the searcher wants to care about. The problem of this search gap is also common in image retrieval.

為改善傳統搜尋引擎在搜尋資料時不夠精準的問題,本發明提出一個針對圖文檢索更有效的精準搜尋方式,係先透過過濾來源文章中是否包含在搜尋引擎中所輸入的關鍵字決定捨棄該筆資料或予以保留,接著檢視保留的資料中所出現的圖片的標籤是否包含了所輸入的關鍵字,若圖片的標籤不含所輸入的關鍵字則捨棄該筆資料,若圖片的標籤包含所輸入的關鍵字則保留該筆資料,並成為最終檢索結果的資料之一。 In order to improve the problem that traditional search engines are not accurate enough when searching for data, the present invention proposes a more effective and precise search method for graphic retrieval, which is to decide whether to discard the keywords entered in the search engine by filtering the source article The pen data may be retained, and then check whether the label of the picture appearing in the retained data contains the entered keyword, if the picture label does not contain the entered keyword, the pen data is discarded, if the picture label contains the The entered keyword retains the data and becomes one of the materials of the final search result.

請參看第二圖。即本發明係關於一種提升文圖搜尋結果相契合最佳化的搜尋過濾方法,該方法至少包括以下步驟:步驟一(S11):在一搜尋引擎輸入至少一關鍵字;步驟二(S12):解析來源文章中是否包含該關鍵字,若無,捨棄所述來源文章,若有,則保留成為初步檢索結果文章; 步驟三(S13):解析該初步檢索結果文章中所夾帶的圖片的標籤中是否包含該關鍵字,若無,捨棄該筆初步檢索結果文章,若有,則保留該筆初步檢索結果文章以成為最終檢索結果文章之一,並將該最終檢索結果文章的標題於顯示裝置顯示。 Please refer to the second picture. That is, the present invention relates to a search filtering method that improves the matching of search results of text images. The method includes at least the following steps: Step 1 (S11): Enter at least one keyword in a search engine; Step 2 (S12): Analyze whether the source article contains the keyword, if not, discard the source article, and if so, keep it as the preliminary search result article; Step three (S13): analyze whether the label of the picture carried in the article of the preliminary search result contains the keyword, if not, discard the article of the preliminary search result, and if so, keep the article of the preliminary search result to become One of the final search result articles, and displays the title of the final search result article on the display device.

由於Google Cloud Vision API產生圖片的註解會有多種變化的可能性,而且每個人描述影像的文字也不盡相同,以致於造成語意間隙現象,為解決此一問題,本發明進一步提供一建立語意庫的步驟,以在真正執行圖文搜尋之前,先建立該次搜尋之語意庫來解決此問題。 Since the annotations of the images generated by the Google Cloud Vision API will have many possibilities for change, and each person's text describing the image is also different, resulting in semantic gaps. To solve this problem, the present invention further provides a semantic database To solve this problem by creating a semantic database for the search before actually performing a graphic search.

該建立語意庫的步驟係在步驟一(S11)在一搜尋引擎輸入至少一關鍵字或步驟二(S12)進行解析來源文章中是否包含所述關鍵字的步驟之前執行,而該建立語意庫的方式可分為二種,分別為:(一)利用匯入少量的圖片來建立所需的語意庫以及(二)直接輸入關鍵字來建立所需的語意庫。 The step of creating a semantic database is performed before step 1 (S11) enters at least one keyword in a search engine or step 2 (S12) analyzes whether the source article contains the keyword, and the step of creating a semantic database The methods can be divided into two types, namely: (1) importing a small number of pictures to create the required semantic database and (2) directly entering keywords to create the required semantic database.

其中,第一種建立方式請參看第三圖,其步驟係包括:步驟一(S21):匯入訓練圖庫;步驟二(S22):解析該圖庫中各圖片可能出現的標籤內容;在本步驟中,係先計算所述圖庫中的多張圖片總共產生k個標籤的適合度Sm,且m=1,…,k,之後由Sj的高低決定是否保留該標籤(Sj表示圖j的適合度);假設每張圖片取得q個標籤內容與其對應分數(q出現在j的範圍中),對應分數以Yij表示,其中i為圖片編號 (i=1,…,n以及j=0,…,q),當j為0代表該圖片並未產出任何標籤,在計算時加總所有k個標籤的總分,如果出現次數多且品質分數高,所計算出的Sm就會比較高,因此該標籤代表這是可保留至語意庫,其運算公式如下: Among them, please refer to the third figure for the first establishment method. The steps include: step one (S21): import into training library; step two (S22): analyze the content of labels that may appear in each picture in the library; in this step , the computing system of the first plurality of pictures in the gallery labels generated in total fitness k S m, and m = 1, ..., k, then S j is determined by the level of whether to retain the label (S j represents the j FIG. The suitability of each image); suppose each image gets q label content and its corresponding score (q appears in the range of j), the corresponding score is represented by Y ij , where i is the image number (i = 1, ..., n and j = 0, ..., q), when j is 0, it means that the image did not produce any tags, and the total score of all k tags is added in the calculation. If there are many occurrences and the quality score is high, the calculated S m is It will be relatively high, so this label means that it can be reserved to the semantic database, and its calculation formula is as follows:

步驟三(S23):收集該可能出現的標籤內容並儲存於語意庫。 Step three (S23): Collect the possible label content and store it in the semantic database.

第二種建立語意庫的方式請參看第四圖,其步驟係包括:步驟一(S31):輸入中文關鍵字;步驟二(S32):以一翻譯應用程式將輸入的中文關鍵字轉譯為英文;步驟三(S33):收集該轉譯後的英文並儲存於語意庫。 The second way to create a semantic database is shown in the fourth figure. The steps include: Step 1 (S31): Enter Chinese keywords; Step 2 (S32): Translate the entered Chinese keywords into English with a translation application ; Step three (S33): Collect the translated English and store it in the semantic database.

另外,也可將既有語意庫的資料來進行本發明之語意庫的擴充,請參看第五圖,其步驟係包括:步驟一(S41):透過既有語意庫(例如:詞網(WordNet))尋找同義字;步驟二(S42):將同義字加入語意庫中儲存。 In addition, the data of the existing semantic database can also be used to expand the semantic database of the present invention. Please refer to the fifth figure. The steps include: Step 1 (S41): Through the existing semantic database (eg: WordNet (WordNet) )) Search for synonyms; step two (S42): add the synonyms to the semantic database for storage.

完成語意庫的建立之後,便能執行圖文檢索(請參看第六圖);首先,在輸入關鍵字之後,系統會根據輸入的關鍵字選擇語意庫中 所包含的同義字詞彙,然後在一來源資料庫中進行文章與語意庫中同義字詞彙(關鍵字)的比對,如果該文章有包含所需的關鍵字(包括同義字詞彙),再去解析該文章圖片的標籤,如果解析出來的標籤有包含所需的關鍵字(包括同義字詞彙),則系統就會顯示該篇文章,反之就會捨棄該篇文章。 After the establishment of the semantic database, you can perform a graphic search (see Figure 6); first, after entering the keyword, the system will select the semantic database according to the entered keyword Include the synonyms of vocabulary, then compare the article with the synonyms of vocabulary (keywords) in the semantic database in a source database, if the article contains the required keywords (including synonyms vocabulary), go to Parse the label of the article picture. If the parsed label contains the required keywords (including synonyms words), the system will display the article, otherwise the article will be discarded.

具體而言,該完成語意庫的建立之後,本發明之提升文圖搜尋結果相契合最佳化的搜尋過濾方法進行圖文檢索的步驟係包括:步驟一(S51):在一搜尋引擎輸入至少一關鍵字;步驟二(S52):選擇語意庫中該關鍵字所包含的所有同義字詞彙;步驟三(S53):於來源資料庫中進行文章與該關鍵字所包含的所有同義字詞彙的比對,解析來源文章中是否出現該些關鍵字所包含的所有同義字詞彙,若無,捨棄所述來源文章,若有,則保留成為初步檢索結果文章;步驟四(S54):解析該初步檢索結果文章中所夾帶的圖片的標籤中是否出現該關鍵字所包含的所有同義字詞彙,若無,捨棄該筆初步檢索結果文章,若有,則保留該筆初步檢索結果文章以成為最終檢索結果文章之一,並將該最終檢索結果文章的標題於顯示裝置顯示。 Specifically, after the establishment of the semantic database, the step of searching and filtering the search results of the improved text image search results in accordance with the present invention includes: step one (S51): input at least one search engine One keyword; Step two (S52): Select all the synonyms of the keyword included in the keyword in the semantic database; Step three (S53): Perform an article in the source database with all the synonyms of the keyword included in the keyword Compare, analyze whether all the synonyms words contained in these keywords appear in the source article, if not, discard the source article, if there is, keep it as the preliminary search result article; Step 4 (S54): analyze the preliminary Whether all the synonyms words contained in the keyword appear in the label of the picture carried in the search result article, if not, the preliminary search result article is discarded, and if there is, the preliminary search result article is retained to become the final search One of the result articles, and the title of the final search result article is displayed on the display device.

又,取得圖片的註解(Image Annotation)或是標籤化(Labelling)的目的,係為了可與所輸入的關鍵字進行比對,以識別該文章的文字除了有符合的關鍵字內容外,圖片也能符合對應該關鍵字的主題,進而提升文圖搜尋結果相契合的最佳化結果。 In addition, the purpose of obtaining image annotations (Image Annotation) or labeling (Labelling) is to compare with the entered keywords to identify the text of the article. In addition to matching keyword content, the pictures are also It can match the theme of the corresponding keyword, and then improve the optimization results that match the search results of the text.

標籤化是建立機器學習的重要模式,需要匯入大量訓練的圖片,提供機器學習演算法(Machine Learning algorithms)學習,尤其新的檢索條件對應之圖片若沒有訓練過時,就無法辨識圖片進而提供圖片的標籤,因此在進行標籤化之前,需要大量的訓練資料,然而,因為訓練機器相當耗費時間與成本,於是本發明為了有效解決這方面的問題,採用Google Cloud Vision API1,該系統可提供進行圖片標籤化(Label Detection)的功能。 Tagging is an important mode for building machine learning. It needs to import a large number of training pictures and provide machine learning algorithms (Machine Learning algorithms) for learning. Especially if the pictures corresponding to the new retrieval conditions are not outdated, they cannot be recognized and then provided. Tags, so before tagging, a lot of training data is required. However, because the training machine takes a lot of time and cost, the present invention uses Google Cloud Vision API1 to effectively solve this problem, the system can provide pictures Labeling function.

第七圖是從ImageNet[J.Deng,W.Dong,R.Socher,L.-J.Li,K.Li,and L.Fei-Fei,“Imagenet:A large-scale hierarchical image database,”in Computer Vision and Pattern Recognition,2009.CVPR 2009.IEEE Conference on.IEEE,2009,pp.248-255.]測試案例所提供北極熊的圖片,待上傳完成圖片至Google Cloud Vision API辨識,並回傳3筆的標籤化運算結果,該API會傳回指定筆數的JSON資料。 The seventh picture is from ImageNet [J.Deng, W.Dong, R.Socher, L.-J.Li, K.Li, and L.Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition, 2009.CVPR 2009.IEEE Conference on.IEEE, 2009, pp.248-255.] The picture of the polar bear provided in the test case, to be uploaded to the Google Cloud Vision API to identify the picture, and return 3 pens The result of the tagging operation, the API will return the specified number of JSON data.

由第七圖測試結果得知,Google Cloud Vision API可以正確的解析出此圖片是北極熊,並且帶來98%的符合程度,另外,該張圖片也有可能是哺乳類動物或者是動物,分別有96%與95%的信心 程度,這些信賴度可用於區別標籤來源信賴權重。 According to the test results of the seventh picture, the Google Cloud Vision API can correctly parse that this picture is a polar bear, and brings 98% compliance. In addition, the picture may also be a mammal or an animal, 96% respectively With 95% confidence Degree, these trust degrees can be used to distinguish the trust weight of the label source.

第八圖~第十圖,為顯示南台地震相關的圖片分析結果,其中,第八圖~第十圖的左邊顯示與南台地震關鍵字相關的圖片與新聞的連結,第八圖~第十圖的右邊分別呈現標籤化之後的結果。 The eighth to tenth pictures show the analysis results of the pictures related to the Nantai earthquake. The left of the eighth to tenth pictures shows the links between the pictures related to the keywords of the Nantai earthquake and the news. The eighth to tenth pictures The right side shows the results after tagging.

觀察第八圖~第十圖標籤化之結果得知,當圖片標籤化後若有地震(Earthquake)、瓦礫(Rubbles)、災難(Disaster)等相關標籤,該筆資料極有可能是搜尋者所需的內容;相反地,如果標籤化結果與搜尋的關鍵字內容不符,該筆資訊則可能是非相關的項目,因此可忽略這些與關鍵字內容不符的文章,第十一圖~第十三圖中顯示文章中所含圖片的標籤化結果,該些文章中所含圖片的標籤化資料中與搜尋的關鍵字內容並未相符,故第十一圖~第十三圖所顯示的圖片將會被捨棄,不會出現在搜尋結果中。 Observing the results of the labeling of the eighth to tenth pictures, if the pictures are labeled with earthquake (Earthquake), rubble (Rubbles), disaster (Disaster) and other related tags, the data is likely to be the searcher. Required content; on the contrary, if the labeling result does not match the searched keyword content, the information may be unrelated items, so these articles that do not match the keyword content can be ignored, Figure 11 ~ Figure 13 Shows the tagging results of the pictures contained in the article. The tagging data of the pictures contained in these articles does not match the content of the searched keywords, so the pictures shown in Figures 11 to 13 will be Is discarded and will not appear in search results.

即本發明能透過採用基於人工智慧的Google Cloud Vision API,存取其圖片中的標籤資訊,藉由關鍵字與圖片中的標籤資訊進行篩選,以達到精準搜尋的目的。 That is, the present invention can access tag information in its pictures by using Google Cloud Vision API based on artificial intelligence, and filter by keywords and tag information in pictures to achieve the purpose of accurate search.

以上所舉者僅係本發明之部份實施例,並非用以限制本發明,致依本發明之創意精神及特徵,稍加變化修飾而成者,亦應包括在本專利範圍之內。 The above mentioned are only some of the embodiments of the present invention, and are not intended to limit the present invention. Therefore, those who have been modified slightly according to the creative spirit and features of the present invention should also be included in the scope of this patent.

綜上所述,本發明實施例確能達到所預期之使用功效,又其所揭露之具體技術手段,不僅未曾見諸於同類產品中,亦未曾公開於申請前,誠已完全符合專利法之規定與要求,爰依法提出發明 專利之申請,懇請惠予審查,並賜准專利,則實感德便。 In summary, the embodiments of the present invention can indeed achieve the expected use effect, and the specific technical means disclosed by it have not only not been seen in similar products, nor have they been disclosed before application, and have fully complied with the patent law. Regulations and requirements, according to law If you apply for a patent, please ask for examination and grant the patent.

Claims (5)

一種提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其步驟包括:步驟一:建立一語意庫;步驟二:在一搜尋引擎輸入至少一關鍵字;步驟三:選擇一語意庫中與所述關鍵字所包含的所有同義字詞彙;步驟四:將一來源資料庫中的文章與所述關鍵字所包含的所有同義字詞彙進行比對,解析所述文章中是否出現所述關鍵字所包含的所有同義字詞彙,若無,捨棄所述文章,若有,保留所述文章成為初步檢索結果文章;步驟五:解析所述初步檢索結果文章中所夾帶的圖片的標籤中是否出現所述關鍵字所包含的所有同義字詞彙,若無,捨棄所述初步檢索結果文章,若有,保留所述初步檢索結果文章以成為最終檢索結果文章,並將所述最終檢索結果文章的標題於顯示裝置顯示;其中,所述建立語意庫的步驟包括:匯入訓練圖庫的步驟、解析所述圖庫中各圖片可能出現的標籤內容的步驟、收集所述可能出現的標籤內容並儲存於語意庫的步驟;在所述解析所述圖庫中各圖片可能出現的標籤內容的步驟中,係先計算所述圖庫中的多張圖片總共產生k個標籤的適合度Sm,且m=1,…,k,之後由Sj的高低決定是否保留該標籤(Sj表示圖j的適合度);假設每張圖片取得q個標籤內容與其對應分數(q出現在j的範圍中),此對應分數以Yij表示,其中i為圖片編號(i=1,…,n以及j=0,…,q),當j為0代表該圖片並未產出任何標籤,在計算時加總所有k個標籤的總分,如果出現次數多且品質分數高,所計算出的Sm就會比較高,因此該標籤代表這是可保留至語意庫,其運算公式如下:
Figure TWI647580B_C0001
A search filtering method that improves the matching of search results of text images includes the following steps: Step 1: Establish a semantic database; Step 2: Enter at least one keyword in a search engine; Step 3: Select a semantic database and All the synonyms words included in the keyword; Step 4: compare the articles in a source database with all the synonyms words included in the keyword, and analyze whether the keywords appear in the article All the synonyms with the same vocabulary, if not, discard the article, and if so, keep the article as the preliminary search result article; Step 5: Analyze whether the tags appear in the tags of the pictures carried in the preliminary search result article All the synonyms words included in the keywords, if not, discard the preliminary search result article, if so, keep the preliminary search result article to become the final search result article, and put the title of the final search result article in The display device displays; wherein, the step of creating a semantic database includes the steps of importing a training library, and parsing the images in the library The steps of possible tag content, the step of collecting the possible tag content and storing it in the semantic database; in the step of parsing the tag content that may appear in each picture in the gallery, the gallery is first calculated The multiple images of generate a total of k labels with a suitability of S m , and m = 1, ..., k, and then the level of S j determines whether to retain the label (S j represents the suitability of figure j); suppose each image Obtain q label contents and their corresponding scores (q appears in the range of j). This corresponding score is represented by Y ij , where i is the picture number (i = 1, ..., n and j = 0, ..., q), when When j is 0, the image does not produce any tags. The total score of all k tags is added during the calculation. If there are many occurrences and the quality score is high, the calculated S m will be relatively high, so the tag represents This is reserved for the semantic database, and its calculation formula is as follows:
Figure TWI647580B_C0001
如申請專利範圍第1項所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,所述建立語意庫的步驟係包括:輸入中文關鍵字的步驟、以及以一翻譯應用程式將所述輸入中文關鍵字轉譯為英文並儲存於語意庫的步驟。As mentioned in the first paragraph of the patent application scope, the search filtering method for improving the search results of the text matches the optimization, wherein the step of creating a semantic database includes the steps of inputting Chinese keywords and a translation application The step of translating the input Chinese keywords into English and storing in the semantic database. 如申請專利範圍第1或2項所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,所述建立語意庫的步驟還包括一擴充語意庫的步驟,包括透過既有語意庫尋找同義字的步驟、將所述同義字加入所述語意庫中儲存。As described in item 1 or 2 of the patent application scope, the search filter method for improving the search results of the text image matches the optimization, wherein the step of creating a semantic database also includes a step of expanding the semantic database, including through the existing semantic The step of the library searching for synonyms is to add the synonyms to the semantic database for storage. 如申請專利範圍第3項所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,在解析所述初步檢索結果文章中所夾帶的圖片的標籤中是否包含所述關鍵字的過程中,是透過連結Google Cloud Vision API存取所述圖片中的標籤資訊。As mentioned in item 3 of the patent application scope, the search filter method for improving the search results of the text image is optimized, wherein whether the keyword is included in the tag of the picture carried in the article of the preliminary search result is analyzed In the process, the tag information in the picture is accessed by linking to the Google Cloud Vision API. 如申請專利範圍第1或2項所述之提升文圖搜尋結果相契合最佳化的搜尋過濾方法,其中,在解析所述初步檢索結果文章中所夾帶的圖片的標籤中是否包含所述關鍵字的過程中,是透過連結Google Cloud Vision API存取所述圖片中的標籤資訊。The search filtering method for improving the search results of the text image according to item 1 or 2 of the patent application scope is optimized, wherein whether the key is included in the label of the picture carried in the article of the preliminary search result is analyzed In the process of wording, the tag information in the picture is accessed by linking to the Google Cloud Vision API.
TW106118108A 2017-06-01 2017-06-01 Search filtering method that enhances the matching of text search results TWI647580B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW106118108A TWI647580B (en) 2017-06-01 2017-06-01 Search filtering method that enhances the matching of text search results

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW106118108A TWI647580B (en) 2017-06-01 2017-06-01 Search filtering method that enhances the matching of text search results

Publications (2)

Publication Number Publication Date
TWI647580B true TWI647580B (en) 2019-01-11
TW201903626A TW201903626A (en) 2019-01-16

Family

ID=65803504

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106118108A TWI647580B (en) 2017-06-01 2017-06-01 Search filtering method that enhances the matching of text search results

Country Status (1)

Country Link
TW (1) TWI647580B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140074825A1 (en) * 2011-02-04 2014-03-13 Kodak Alaris Inc. Identifying particular images from a collection
TW201528173A (en) * 2014-01-09 2015-07-16 Alibaba Group Services Ltd Method and device for searching and displaying merchandise images
CN104881451A (en) * 2015-05-18 2015-09-02 百度在线网络技术(北京)有限公司 Image searching method and image searching device
TW201539212A (en) * 2014-04-11 2015-10-16 Univ Chia Nan Pharm & Sciency Construction method, database system, computer program, and computer readable recoding medium of relevant literature inquiry

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140074825A1 (en) * 2011-02-04 2014-03-13 Kodak Alaris Inc. Identifying particular images from a collection
TW201528173A (en) * 2014-01-09 2015-07-16 Alibaba Group Services Ltd Method and device for searching and displaying merchandise images
TW201539212A (en) * 2014-04-11 2015-10-16 Univ Chia Nan Pharm & Sciency Construction method, database system, computer program, and computer readable recoding medium of relevant literature inquiry
CN104881451A (en) * 2015-05-18 2015-09-02 百度在线网络技术(北京)有限公司 Image searching method and image searching device

Also Published As

Publication number Publication date
TW201903626A (en) 2019-01-16

Similar Documents

Publication Publication Date Title
US10698977B1 (en) System and methods for processing fuzzy expressions in search engines and for information extraction
US10482115B2 (en) Providing question and answers with deferred type evaluation using text with limited structure
Zhao et al. Ontology based opinion mining for movie reviews
CN107180045B (en) Method for extracting geographic entity relation contained in internet text
US7788099B2 (en) Method and apparatus for query expansion based on multimodal cross-vocabulary mapping
US20210407499A1 (en) Automatically generating conference minutes
CN107193892B (en) A kind of document subject matter determines method and device
Wang et al. Challenges in chinese knowledge graph construction
CN107918644A (en) News subject under discussion analysis method and implementation system in reputation Governance framework
CN111930793A (en) Target behavior mining and retrieval analysis method, system, computer equipment and application
Nesi et al. Ge (o) Lo (cator): Geographic information extraction from unstructured text data and Web documents
Hinze et al. Improving access to large-scale digital libraries throughsemantic-enhanced search and disambiguation
Orellana et al. A text mining methodology to discover syllabi similarities among higher education institutions
Liu et al. Complura: Exploring and leveraging a large-scale multilingual visual sentiment ontology
Elbassuoni et al. ROXXI: Reviving witness dOcuments to eXplore eXtracted Information
Zhang et al. Product features extraction and categorization in Chinese reviews
Schirmer et al. A new dataset for topic-based paragraph classification in genocide-related court transcripts
Al-Sultany et al. Enriching tweets for topic modeling via linking to the wikipedia
TWI647580B (en) Search filtering method that enhances the matching of text search results
Ahmed et al. Developing an intelligent question answering system
Singh et al. A Content-based eResource Recommender System to augment eBook-based Learning
Pu et al. A vision-based approach for deep web form extraction
TWI693524B (en) Optimization method for searching exclusive personalized pictures
Canale et al. From teaching books to educational videos and vice versa: a cross-media content retrieval experience
Zhang et al. Visualization of location-referenced web textual information based on map mashups

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees