TWI812243B - Method for produce extended web address and electronic device - Google Patents

Method for produce extended web address and electronic device Download PDF

Info

Publication number
TWI812243B
TWI812243B TW111119792A TW111119792A TWI812243B TW I812243 B TWI812243 B TW I812243B TW 111119792 A TW111119792 A TW 111119792A TW 111119792 A TW111119792 A TW 111119792A TW I812243 B TWI812243 B TW I812243B
Authority
TW
Taiwan
Prior art keywords
url
processor
mentioned
extended
string
Prior art date
Application number
TW111119792A
Other languages
Chinese (zh)
Other versions
TW202347142A (en
Inventor
周生傑
陳良其
Original Assignee
宏碁股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 宏碁股份有限公司 filed Critical 宏碁股份有限公司
Priority to TW111119792A priority Critical patent/TWI812243B/en
Application granted granted Critical
Publication of TWI812243B publication Critical patent/TWI812243B/en
Publication of TW202347142A publication Critical patent/TW202347142A/en

Links

Landscapes

  • Small-Scale Networks (AREA)
  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

Method for producing extended web address is provided. The method includes: distinguishing at least one string and a origin web address from an image by a processor using optical character recognition. Wherein the origin web address includes a path information of a current level and a path information of a upper level of the current level. The path information of the upper level corresponds to a catalog and a catalog website. The method further include using the string distinguished from the image or links recorded in the catalog website to produce at least one extended web address, which is related to the origin web address, by the processor.

Description

產生擴展網址的方法及電子裝置Method and electronic device for generating extended URL

本發明是關於網路技術領域,特別是關於一種產生擴展網址的方法。The present invention relates to the field of network technology, and in particular to a method for generating an extended URL.

光學字元識別(Optical Character Recognition(OCR)),是一種識別影像中的文字的技術。在目前的手機系統中,有許多帶有OCR的應用程式,例如Line、Google Lens等,這些應用程式能夠解讀圖片中的文字,並且當偵測到網址列時,也會自動標示出超連結,提供給使用者點選。然而,若是圖片中的網址有缺損,目前的OCR功能只會照著有出現的字元進行辨識,進而給出錯誤的連結,造成使用者找不到網頁的狀況。若是上述網站有特別對錯誤網址做處理,則通常會導向到上述網站的首頁。但無論是找不到網頁或者是重新導向至首頁的方法,都會讓使用者無法連接到原本想開啟的網頁。更甚者,若是上述網址是由雜湊碼(hash code)亂碼組成,或是有相當的長度,都有可能會讓使用者難以搜尋,再也找不到上述網頁。因此,本發明提出一種能夠根據偵測到的網址,將上述網址擴展的方法,藉此將缺漏的網址補齊。Optical Character Recognition (OCR) is a technology for identifying text in images. In current mobile phone systems, there are many applications with OCR, such as Line, Google Lens, etc. These applications can interpret the text in the picture, and when the address bar is detected, they will automatically mark the hyperlink. Provided to users for selection. However, if the URL in the image is defective, the current OCR function will only identify the characters that appear, and then give wrong links, causing users to be unable to find the web page. If the above-mentioned website has special processing for wrong URLs, it will usually be directed to the homepage of the above-mentioned website. However, whether the webpage cannot be found or the method of redirecting to the homepage will prevent users from connecting to the webpage they originally wanted to open. What's more, if the above URL is composed of hash codes or is of considerable length, it may be difficult for users to search and the above web page will no longer be found. Therefore, the present invention proposes a method that can expand the above-mentioned web address according to the detected web address, thereby filling in the missing web address.

本發明的實施例有關於一種產生擴展網址的方法。上述方法包含:使用處理器利用光學字元辨識(OCR)技術,從圖片中辨識出至少一個字串及原始網址。其中原始網址包含當前層級的路徑資訊以及上述當前層級之上層層級的路徑資訊,上述上層層級的路徑資訊對應目錄及目錄網頁。上述方法更包含處理器使用圖片中所辨識出的字串或目錄網頁中所記載的至少一個連結,產生原始網址所相關的至少一個擴展網址。Embodiments of the present invention relate to a method for generating an extended URL. The above method includes: using a processor to use optical character recognition (OCR) technology to identify at least one character string and the original URL from the image. The original URL includes the path information of the current level and the path information of the upper level above the current level. The path information of the upper level corresponds to the directory and the directory web page. The above method further includes the processor using the character string recognized in the image or at least one link recorded in the directory web page to generate at least one extended URL related to the original URL.

在一些實施例中,上述方法更包含判斷上述當前層級的路徑資訊是否超過預定字數。回應於判斷當前層級的路徑資訊超過預定字數時,判斷字串的前n個字元是否與當前層級的路徑資訊相符,n為當前層級的路徑資訊的字元數。回應於字串的前n個字元與當前層級的路徑資訊相符,將上述字串的前n個字元去除並接到原始網址的當前層級的路徑資訊後,產生擴展網址的至少一個候選網址。In some embodiments, the method further includes determining whether the path information of the current level exceeds a predetermined number of words. In response to judging that the path information of the current level exceeds the predetermined number of characters, it is judged whether the first n characters of the string match the path information of the current level, where n is the number of characters of the path information of the current level. In response to the fact that the first n characters of the string match the path information of the current level, the first n characters of the string are removed and connected to the path information of the current level of the original URL to generate at least one candidate URL of the extended URL. .

在一些實施例中,上述方法更包含判斷候選網址所對應的網頁之Http狀態碼是否為200 OK。回應於Http狀態碼為200 OK,處理器以Http狀態碼為200 OK的候選網址作為擴展網址。In some embodiments, the above method further includes determining whether the HTTP status code of the web page corresponding to the candidate URL is 200 OK. In response to the Http status code being 200 OK, the processor uses the candidate URL with the Http status code as 200 OK as the extension URL.

在一些實施例中,上述方法更包含判斷目錄網頁的Http狀態碼是否為200 OK。回應於目錄網頁的Http狀態碼為200 OK,處理器使用目錄網頁中記載的連結產生擴展網址。在一些實施例中,將目錄網頁中所記載的所有連結所對應的網址作為擴展網址的候選網址。在一些實施例中,處理器將包含原始網址的候選網址作為擴展網址。In some embodiments, the above method further includes determining whether the HTTP status code of the directory web page is 200 OK. In response to the HTTP status code of the directory page being 200 OK, the processor uses the link recorded in the directory page to generate the extended URL. In some embodiments, the URLs corresponding to all links recorded in the directory web page are used as candidate URLs for the expanded URL. In some embodiments, the processor treats the candidate URL containing the original URL as the expanded URL.

在一些實施例中,上述方法更包含上述擴展網址顯示至一螢幕上。在一些實施例中,上述當前層級的路徑資訊為上述原始網址最後一個斜線字元之後的部份。In some embodiments, the method further includes displaying the expanded URL on a screen. In some embodiments, the path information of the current level is the part after the last slash character of the original URL.

本發明的實施例有關於一種電子裝置。上述電子裝置包含:處理器、記憶體,以及網路介面。記憶體中包含由處理器讀取並執行的一或多個機器可讀指令。網路介面將電子裝置連接至網路。其中,當機器可讀指令由處理器讀取並執行時,致使處理器運行本發明實施例之產生擴展網址的方法。Embodiments of the invention relate to an electronic device. The above-mentioned electronic device includes: processor, memory, and network interface. Memory contains one or more machine-readable instructions that are read and executed by the processor. A network interface connects electronic devices to the Internet. When the machine-readable instructions are read and executed by the processor, the processor is caused to run the method for generating an extended URL according to the embodiment of the present invention.

在說明書及申請專利範圍當中使用了某些詞彙來指稱特定的元件。本領域技術人員應可理解,硬體製造商可能會用不同的名詞來稱呼同一個元件。本說明書及申請專利範圍並不以名稱的差異來作為區分元件的方式,而是以元件在功能上的差異來作為區分的準則。在通篇說明書及申請專利範圍當中所提及的「包含」及「包括」一詞為開放式的用語,故應解釋成「包含但不僅限定於」。「大致」一詞則是指在可接受的誤差範圍內,本領域技術人員能夠在一定誤差範圍內解決所述技術問題,達到所述基本之技術效果。此外,「耦接」一詞在本說明書中包含任何直接及間接的電性連接手段。因此,若文中描述一第一裝置耦接至一第二裝置,則代表上述第一裝置可直接電性連接至上述第二裝置,或經由其它裝置或連接手段而間接地電性連接至上述第二裝置。Certain words are used in the specification and patent claims to refer to specific components. Those skilled in the art will understand that hardware manufacturers may use different names to refer to the same component. This specification and the patent application do not use differences in names as a way to distinguish components, but differences in functions of components as a criterion for distinction. The words "include" and "include" mentioned throughout the specification and the scope of the patent application are open-ended terms, and therefore should be interpreted as "include but not limited to." The term "approximately" means that within an acceptable error range, those skilled in the art can solve the technical problem and achieve the basic technical effect within a certain error range. In addition, the word "coupling" in this specification includes any direct and indirect electrical connection means. Therefore, if a first device is coupled to a second device, it means that the first device can be directly electrically connected to the second device, or indirectly electrically connected to the second device via other devices or connections. Two devices.

參考第1圖,第1圖為本發明實施例之電子裝置100的方塊圖。電子裝置100包含處理器102、記憶體104、網路介面106、攝影機108,以及螢幕110。處理器102、記憶體104、網路介面106、攝影機108,以及螢幕110透過一或多個通訊匯流排或訊號線與彼此通訊。電子裝置100可為任何合適的電子裝置,包括但不侷限於筆記型電腦、平板電腦、智慧型手機、媒體播放器、個人數位助理,或其他類似裝置,並包括上述裝置其中二個或更多的組合。須瞭解的是,電子裝置100僅為電子裝置的一個例子,電子裝置100可以具有比圖式中更多或更少的元件,或是有不同的元件配置。第1圖所示之元件可藉由硬體、軟體或硬體與軟體結合的形式實現,例如一或多個訊號處理積體電路和/或特定應用積體電路。Referring to FIG. 1 , FIG. 1 is a block diagram of an electronic device 100 according to an embodiment of the present invention. The electronic device 100 includes a processor 102, a memory 104, a network interface 106, a camera 108, and a screen 110. The processor 102, memory 104, network interface 106, camera 108, and screen 110 communicate with each other through one or more communication buses or signal lines. The electronic device 100 may be any suitable electronic device, including but not limited to a laptop, a tablet, a smartphone, a media player, a personal digital assistant, or other similar devices, and includes two or more of the above devices. combination. It should be understood that the electronic device 100 is only an example of an electronic device, and the electronic device 100 may have more or fewer components than those shown in the drawings, or may have different component configurations. The components shown in Figure 1 may be implemented in the form of hardware, software, or a combination of hardware and software, such as one or more signal processing integrated circuits and/or application specific integrated circuits.

處理器102控制電子裝置100的操作。處理器102提供執行作業系統、程式、使用者圖形介面、軟體、模組、應用程式和電子裝置100之功能所需的處理能力。處理器102可包括單一處理器,或者處理器102可包括複數個處理器。舉例而言,處理器102可包括中央處理單元、一般用途微處理器、一般用途微處理器和特殊用途處理器之組合及/或相關晶片組。一般用途微處理器和特殊用途處理器之組合的例子為指令集處理器、圖形處理器、視頻處理器、音訊處理器和特殊用途微處理器。The processor 102 controls the operation of the electronic device 100 . The processor 102 provides the processing power required to execute operating systems, programs, user graphical interfaces, software, modules, applications and functions of the electronic device 100 . Processor 102 may include a single processor, or processor 102 may include a plurality of processors. For example, processor 102 may include a central processing unit, a general purpose microprocessor, a combination of general purpose microprocessors and special purpose processors, and/or an associated chipset. Examples of combinations of general purpose microprocessors and special purpose processors are instruction set processors, graphics processors, video processors, audio processors, and special purpose microprocessors.

處理器102所使用的資訊儲存於記憶體104。記憶體104儲存處理器102操作時所需的資料以及電子裝置100所需的其他資料。舉例而言,記憶體104儲存影像資料、多媒體檔案(例如音樂或視頻檔案)、無線連結資訊(例如可供電子裝置100建立無線連結的資訊,無線連結例如為網路連線)以及任何其他適合的資料。Information used by processor 102 is stored in memory 104 . The memory 104 stores data required for the operation of the processor 102 and other data required by the electronic device 100 . For example, the memory 104 stores image data, multimedia files (such as music or video files), wireless connection information (such as information that allows the electronic device 100 to establish a wireless connection, such as a network connection), and any other suitable information.

記憶體104可以包含非揮發性記憶體,例如唯讀記憶體(Read Only Memory,ROM)、快閃記憶體(flash memory)、硬碟、光學式電腦可讀取媒體、磁性電腦可讀取媒體、固態電腦可讀取媒體以及其中的組合。記憶體104也可以包含揮發性記憶體,例如動態隨機存取記憶體及靜態隨機存取記憶體等。The memory 104 may include non-volatile memory, such as read only memory (ROM), flash memory (flash memory), hard disk, optical computer readable media, magnetic computer readable media , solid-state computer readable media, and combinations thereof. The memory 104 may also include volatile memory, such as dynamic random access memory and static random access memory.

機器可讀指令105儲存於記憶體104中,並且可以由處理器102讀取,致使處理器102執行多個操作。機器可讀指令105例如為C、C++、Python、機器語言等。Machine-readable instructions 105 are stored in memory 104 and may be read by processor 102, causing processor 102 to perform a plurality of operations. The machine-readable instructions 105 are, for example, C, C++, Python, machine language, etc.

網路介面106使處理器102能夠連接網際網路107。網路介面106包括但不侷限於無線區域網路(Wireless Local Area Network,WLAN)介面、無結構輔助服務資料(Unstructured Supplementary Service Data,USSD) 介面、區域網路介面、寬域網路介面等。Network interface 106 enables processor 102 to connect to the Internet 107 . The network interface 106 includes but is not limited to a Wireless Local Area Network (WLAN) interface, an Unstructured Supplementary Service Data (USSD) interface, a local network interface, a wide area network interface, etc.

攝影機108可以拍攝影像。攝影機108可以是相機、攝影機以及攝影鏡頭等,但不在此限。螢幕110可以顯示影像。例如螢幕110可以受處理器102的控制而顯示特定的影像。舉例來說,螢幕110可以為液晶型面板、觸控式螢幕等,但是不在此限。Camera 108 can capture images. The camera 108 may be a camera, a camera, a photography lens, etc., but is not limited thereto. Screen 110 can display images. For example, the screen 110 can be controlled by the processor 102 to display a specific image. For example, the screen 110 can be a liquid crystal panel, a touch screen, etc., but is not limited to this.

處理器102耦接記憶體104、網路介面106、攝影機108以及螢幕110,並且可以控制上述元件或是與上述元件交換訊息。處理器102透過網路介面106連接網際網路,以與遠端的伺服器連線。處理器102透過網路介面106傳送請求至遠端的伺服器,並且接受遠端伺服器回應的訊息。處理器102運行記憶體104中的機器可讀指令105以執行本發明之產生擴展網址方法,以下將詳細說明本發明之產生擴展網址方法。The processor 102 is coupled to the memory 104, the network interface 106, the camera 108 and the screen 110, and can control the above components or exchange information with the above components. The processor 102 is connected to the Internet through the network interface 106 to connect with the remote server. The processor 102 sends the request to the remote server through the network interface 106 and receives the response message from the remote server. The processor 102 executes the machine-readable instructions 105 in the memory 104 to execute the method of generating an extended URL of the present invention. The method of generating an extended URL of the present invention will be described in detail below.

第2圖為本發明實施例之流程圖,示意說明用以產生擴展網址的方法200。方法200可以由處理器102運行記憶體104中的機器可讀指令105的方式執行。在操作202中,處理器102接收圖片。處理器102可以從攝影機108接收圖片(或影像檔),或者處理器102可以讀取儲存在記憶體104中的圖片(或影像檔)。Figure 2 is a flow chart of an embodiment of the present invention, schematically illustrating a method 200 for generating an extended URL. Method 200 may be performed by processor 102 executing machine-readable instructions 105 in memory 104 . In operation 202, processor 102 receives a picture. The processor 102 may receive pictures (or image files) from the camera 108 , or the processor 102 may read pictures (or image files) stored in the memory 104 .

在操作204中,處理器102利用OCR技術從圖片中辨識多個字串,並且從多個字串中辨識出網址。舉例來說,處理器102可以先將圖片的背景部份與文字部份(可以包含文字、數字和標點符號)分離。處理器102之後將分離出的文字部份中的所有文字、數字和標點符號分別分割開來,成為單獨的字元影像。處理器102之後抓取每個字元影像的特徵,將抓取到的特徵與文字資料庫中的文字做比對,判斷出字元影像應該是哪一個文字、數字或符號。在利用OCR技術從圖片中辨識多個文字、數字或符號後,處理器102將從圖片中辨識出的多個文字、數字或符號輸出,並且儲存至記憶體104中。處理器102可以將辨識出的多個文字、數字或符號重新組合成圖片上出現的字串進行儲存。在本發明實施例中,字元指單一的文字、數字或符號,字串則是由多個字元組成。在某些實施例中,處理器102會執行具有光學字元識別(OCR)功能的應用程式,例如Line、Google Lens等,以便解讀圖片中的文字。In operation 204, the processor 102 uses OCR technology to identify a plurality of word strings from the image, and identifies a website address from the plurality of word strings. For example, the processor 102 may first separate the background part of the picture from the text part (which may include text, numbers and punctuation marks). The processor 102 then separates all the characters, numbers and punctuation marks in the separated text parts into separate character images. The processor 102 then captures the characteristics of each character image, compares the captured characteristics with the text in the text database, and determines which text, number or symbol the character image should be. After using the OCR technology to identify multiple characters, numbers or symbols from the image, the processor 102 outputs the multiple characters, numbers or symbols recognized from the image and stores them in the memory 104 . The processor 102 can recombine the recognized characters, numbers or symbols into a word string that appears on the picture and store it. In the embodiment of the present invention, a character refers to a single text, number or symbol, and a string is composed of multiple characters. In some embodiments, the processor 102 executes an application with optical character recognition (OCR) function, such as Line, Google Lens, etc., in order to interpret the text in the picture.

處理器102之後從多個字串中辨識出至少一個字串為統一資源定位符(uniform resource locator(URL))。舉例來說,處理器102可以判斷字串的開頭的數個字元是否為「http://」、「www」或是「http:// www.」,並且結尾的數個字元是否為「.com」、「.gov」等特定的字串。此外,處理器102也可以判斷字串中是否包含點(「.」)或斜線(「/」)等字元。若是符合條件則處理器102判斷字串為一個URL(以下將辨識出的URL稱為原始網址),並將上述字串為URL的訊息與上述字串相關聯,儲存在記憶體104中。處理器102辨識出的原始網址可以是能夠使處理器102順利連上網頁的網址,也可以是無法使處理器102連上任何網頁的網址。The processor 102 then identifies at least one string as a uniform resource locator (URL) from the plurality of strings. For example, the processor 102 can determine whether the first few characters of the string are "http://", "www" or "http:// www." and whether the last few characters are "http://", "www" or "http://www." Specific strings such as ".com" and ".gov". In addition, the processor 102 can also determine whether the string contains characters such as dot (".") or slash ("/"). If the conditions are met, the processor 102 determines that the string is a URL (hereinafter, the recognized URL will be referred to as the original URL), associates the message that the string is a URL with the string, and stores it in the memory 104 . The original URL recognized by the processor 102 may be a URL that enables the processor 102 to successfully connect to a web page, or may be a URL that cannot enable the processor 102 to connect to any web page.

處理器102辨識出的原始網址可以包含多個路徑的資訊,每個路徑資訊之間以斜線字元分開。此外,根據原始網址中的路經資訊可以建構出原始網址的根目錄網址及/或至少一個子目錄網址,每個路經資訊級子目錄之間是上下層的關係。舉例而言,處理器102辨識出的原始網址可以具有根目錄路徑、一級子路徑、二級子路徑,以及當前路徑等目錄路徑的資訊。原始網址的最後一個斜線字元(從右邊數來第1個斜線字元)以後的部份可以被稱為當前路徑(也就是當前層級的路經資訊)。原始網址的倒數第二個(從右邊數來第2個斜線字元)與最後一個斜線字元之間的部份可以稱為N級子路徑(在此以N級子路徑表示當前路徑的上一層路徑)。原始網址的倒數第三個與倒數第二個之間的部份可以稱為N-1級子路徑,以此類推。而從「http://」開始到第一個斜線字元之間的部份可以稱為根目錄路徑。根目錄路徑即為原始網址的根目錄網址。將根目錄路徑的後面以斜線字元間隔接上一級子路徑,可以得到原始網址的一級子目錄網址。將根目錄路徑的後面以斜線字元間隔接上一級子路徑及二級子路徑,可以得到原始網址的二級子目錄網址。以此類推,將根目錄路徑的後面按順序以斜線字元間隔接上一級子路徑到N級子路徑,可以得到原始網址的N級子目錄網址。將根目錄路徑的後面按順序以斜線字元間隔接上一級子路徑到當前路徑,可以得到原始網址。每一個路徑資訊可以對應一個子目錄網址或根目錄網址,每一個子目錄網址及根目錄網址可以對應一個網頁及/或目錄。根目錄路徑、一級子路徑到N級子路徑都可以稱為當前路徑的上層路徑資訊(也就是上層層級的路經資訊)。根目錄網址、一級子目錄網址到N級子目錄網址都可以稱為原始網址的上層網址。The original URL recognized by the processor 102 may include multiple path information, and each path information is separated by a slash character. In addition, the root directory URL and/or at least one subdirectory URL of the original URL can be constructed based on the path information in the original URL. Each path information-level subdirectory has an upper-lower relationship. For example, the original URL identified by the processor 102 may have directory path information such as a root directory path, a first-level sub-path, a second-level sub-path, and a current path. The part after the last slash character of the original URL (the first slash character from the right) can be called the current path (that is, the path information of the current level). The part between the second to last slash character (the second slash character from the right) and the last slash character of the original URL can be called an N-level sub-path (here, N-level sub-path represents the upper part of the current path). one level path). The part between the third to last and the second to last of the original URL can be called the N-1 level sub-path, and so on. The part from "http://" to the first slash character can be called the root directory path. The root directory path is the root directory URL of the original URL. By appending the upper-level sub-path with slash characters after the root directory path, you can get the first-level sub-directory URL of the original URL. By connecting the upper-level sub-path and the second-level sub-path with slash characters after the root directory path, you can get the second-level sub-directory URL of the original URL. By analogy, by connecting the upper-level sub-path to N-level sub-paths in sequence with slash characters after the root directory path, the N-level sub-directory URL of the original URL can be obtained. The original URL can be obtained by connecting the upper-level sub-path to the current path with slash characters after the root directory path. Each path information can correspond to a subdirectory URL or a root directory URL, and each subdirectory URL and root directory URL can correspond to a web page and/or directory. The root directory path, first-level sub-paths to N-level sub-paths can all be called upper-level path information of the current path (that is, upper-level path information). The root directory URL, the first-level subdirectory URL to the N-level subdirectory URL can all be called the upper-level URL of the original URL.

換句話說,將原始網址的最後一個斜線字元之後的部份去除,可以得到原始網址的當前子目錄網址。將原始網址的到數第二個斜線字元之後的部份去除,可以得到原始網址的N級子目錄網址。將原始網址的到數第三個斜線字元之後的部份去除,可以得到原始網址的N-1級子目錄網址,以此類推。舉例來說,原始網址為https://www.acer.com/ac/zh/TW/content/ces-awarded-products時,根目錄路徑為https://www.acer.com/。一級子路徑為ac,二級子路徑為zh,三級子路徑為TW,四級子路徑為content,以及當前路徑為ces-awarded-products。根目錄路徑即為根目錄網址。此外,可以如上所述的得到一級子目錄網址為https://www.acer.com/ac/。二級子目錄網址為https://www.acer.com/ac/zh/。三級子目錄網址為https://www.acer.com/ac/zh/TW/。四級子目錄網址為https://www.acer.com/ac/zh/TW/content/。In other words, by removing the part after the last slash character of the original URL, you can get the current subdirectory URL of the original URL. By removing the part after the second slash character from the original URL, the N-level subdirectory URL of the original URL can be obtained. By removing the part after the third slash character from the original URL, you can get the N-1 subdirectory URL of the original URL, and so on. For example, when the original URL is https://www.acer.com/ac/zh/TW/content/ces-awarded-products, the root directory path is https://www.acer.com/. The first-level subpath is ac, the second-level subpath is zh, the third-level subpath is TW, the fourth-level subpath is content, and the current path is ces-awarded-products. The root directory path is the root directory URL. In addition, the first-level subdirectory URL can be obtained as https://www.acer.com/ac/ as mentioned above. The second-level subdirectory URL is https://www.acer.com/ac/zh/. The third-level subdirectory URL is https://www.acer.com/ac/zh/TW/. The fourth-level subdirectory URL is https://www.acer.com/ac/zh/TW/content/.

在操作206中,處理器102使用儲存在記憶體104中的字串以及原始網址,產生原始網址的擴展網址。在本發明的一些實施例中,擴展網址為包含原始網址的網址。處理器102可以比較字串的開頭數個字元與當前路徑的最後數個字元,也就是比較字串的開頭數個字元與原始網址最後數個字元。當字串的開頭數個字元與當前路徑的最後數個字元相同時,以上述字串取代原始網址的最後數個字元,形成原始網址的擴展網址。或者,處理器102可以透過網路介面106連接網際網路107,存取原始網址的當前路徑的上層路徑所對應的網頁,也就是存取子目錄網址或根目錄網址所對應的網頁。處理器102接著取得上述網頁中記載的連結。最後,處理器102將上述連結的網址作為原始網址的擴展網址。處理器102將產生的擴展網址儲存至記憶體104中,並且處理器102控制螢幕110顯示擴展網址。In operation 206, the processor 102 uses the string stored in the memory 104 and the original URL to generate an expanded URL of the original URL. In some embodiments of the invention, the extended URL is a URL that includes the original URL. The processor 102 may compare the first few characters of the string with the last few characters of the current path, that is, compare the first few characters of the string with the last few characters of the original URL. When the first few characters of the string are the same as the last few characters of the current path, the last few characters of the original URL are replaced with the above string to form an expanded URL of the original URL. Alternatively, the processor 102 can connect to the Internet 107 through the network interface 106 to access the web page corresponding to the upper path of the current path of the original web address, that is, to access the web page corresponding to the subdirectory web address or the root directory web address. The processor 102 then obtains the link recorded in the web page. Finally, the processor 102 uses the above-linked URL as the extended URL of the original URL. The processor 102 stores the generated extended URL into the memory 104, and the processor 102 controls the screen 110 to display the extended URL.

可選擇的,操作206可以是在處理器102判斷原始網址為無效的網址後執行。舉例來說,處理器102可以透過網路介面106連接網際網路107,嘗試與原始網址所對應的網頁進行連線,並且經由網際網路107及網路介面106接收上述網頁的伺服器回應的超文本傳輸協定(HyperText Transfer Protocol;Http)狀態碼(Http Status Code)。Http狀態碼是用以表示網頁伺服器超文本傳輸協定回應狀態的代碼,以3位數字表示。依照開頭的數字可以分為五類,分別為1開頭(1XX)的參考資訊(information),2開頭(2XX)的成功(successful),3開頭(3XX)的重新導向(redirection),4開頭(4XX)的用戶端錯誤(client error),5開頭(5XX)的伺服器錯誤(server error)。其中Http狀態碼200 OK代表用戶端要求成功。處理器102判斷接收的Http狀態碼是否為200 OK,如果不是200 OK則處理器102判斷原始網址為無效的網址,並且執行操作206。如果是200 OK則處理器102判斷原始網址為有效的網址,並且不執行操作206。Optionally, operation 206 may be performed after the processor 102 determines that the original URL is an invalid URL. For example, the processor 102 can connect to the Internet 107 through the network interface 106, try to connect to the web page corresponding to the original URL, and receive the server response of the above web page through the Internet 107 and the network interface 106. HyperText Transfer Protocol (Http) status code (Http Status Code). Http status code is a code used to indicate the response status of the web server's Hypertext Transfer Protocol, expressed as a 3-digit number. It can be divided into five categories according to the number at the beginning, namely reference information starting with 1 (1XX), success (successful) starting with 2 (2XX), redirection starting with 3 (3XX), and redirection starting with 4 ( 4XX) client error (client error), starting with 5 (5XX) server error (server error). The HTTP status code 200 OK represents that the client request was successful. The processor 102 determines whether the received HTTP status code is 200 OK. If it is not 200 OK, the processor 102 determines that the original URL is an invalid URL, and performs operation 206. If it is 200 OK, the processor 102 determines that the original URL is a valid URL and does not perform operation 206.

參考第3圖,第3圖為本發明實施例之流程圖,示意說明使用圖片中的字串產生擴展網址的方法300。方法300可以由處理器102運行記憶體104中的機器可讀指令105的方式執行。在操作302中,使用者對一張圖片(或影像檔)啟動OCR功能,致使處理器102接收或讀取上述圖片。圖片可以由攝影機108拍攝後傳送到處理器102及/或記憶體104,或者上述圖片可以是之前儲存在記憶體104中的圖片,並由處理器102讀取。在操作304中,處理器102如上所述的使用OCR技術從圖片中辨識出多個字串,並且從字串中辨識出原始網址。處理器102將辨識出的字串及原始網址儲存至記憶體104中。處理器102之後在螢幕110上顯示提供對應於原始網址的超連結供使用者點擊。舉例來說,處理器102可以提供能夠連結到原始網址所對應的網站的超連結。或者,當使用者點擊超連結後,處理器102透過網路介面106嘗試連線到原始網址。Referring to Figure 3, Figure 3 is a flow chart of an embodiment of the present invention, schematically illustrating a method 300 of generating an extended URL using word strings in images. Method 300 may be performed by processor 102 executing machine-readable instructions 105 in memory 104 . In operation 302, the user activates the OCR function on a picture (or image file), causing the processor 102 to receive or read the picture. The picture may be captured by the camera 108 and then sent to the processor 102 and/or the memory 104 , or the picture may be a picture previously stored in the memory 104 and read by the processor 102 . In operation 304, the processor 102 uses the OCR technology to identify a plurality of word strings from the image as described above, and identifies the original URL from the word strings. The processor 102 stores the recognized character string and the original URL into the memory 104 . The processor 102 then displays and provides a hyperlink corresponding to the original URL on the screen 110 for the user to click. For example, the processor 102 may provide a hyperlink that can be linked to the website corresponding to the original URL. Or, when the user clicks on the hyperlink, the processor 102 attempts to connect to the original URL through the network interface 106 .

在操作306中,處理器102判斷原始網址的當前路徑(也就是當前層級的路徑資訊)的字元數是否有超過預定的字數,也就是說處理器102判斷原始網址的最後一個斜線字元(也就是網址從右邊數來第1個斜線字元)之後的字元數是否有超過預定的字數。預定的字數可以是2個、3個、4個、5個,或5個以上。若是沒有進行操作306,或是預定的字數設定的太小,將導致後續找出的擴展網址不精確。若是處理器102判斷原始網址的最後一個斜線字元之後的字元數沒有超過預定的字數,處理器102執行操作318。在操作318中,處理器102只將原始網址顯示到螢幕110上,供使用者參考。若是處理器102判斷原始網址的當前路徑的字元數超過預定的字數,處理器102執行操作308。In operation 306, the processor 102 determines whether the number of characters of the current path of the original URL (that is, the path information of the current level) exceeds a predetermined number of characters. That is to say, the processor 102 determines whether the last slash character of the original URL is (That is, the first slash character from the right of the URL) Whether the number of characters after it exceeds the predetermined number of characters. The predetermined number of words can be 2, 3, 4, 5, or more than 5. If operation 306 is not performed, or the predetermined number of characters is set too small, the subsequent expanded URL found will be inaccurate. If the processor 102 determines that the number of characters after the last slash character of the original URL does not exceed the predetermined number of characters, the processor 102 performs operation 318. In operation 318, the processor 102 only displays the original URL on the screen 110 for the user's reference. If the processor 102 determines that the number of characters in the current path of the original URL exceeds the predetermined number of characters, the processor 102 performs operation 308.

在操作308中,處理器102搜尋之前在操作304中從圖片辨識出的字串。在某些實施例中,處理器102讀取之前在操作304中儲存於記憶體104內的每一個字串。在操作310中,處理器102將讀取到的字串與原始網址做比對。在某些實施例中,處理器102首先判斷原始網址的當前路徑有幾個字元。在此以原始網址的當前路徑有5個字元為範例。接著,處理器102比對每一個字串的前5個字元是否與原始網址的當前路徑相符。相符的字串可以是與原始網址的當前路徑完全相同的字串,或者相符的字串也可以是與網址的當前路徑僅有英文大小寫差異的字串。僅有英文大小寫的差異代表字串與網址的當前路徑的字母是相同的,只是一者是大寫而另一者為小寫。例如「abc」、「Abc」、「AbC」都是僅有英文大小寫的差異,因此會被處理器102判斷為相符的字串。當處理器102找到開頭前5個字與網址的當前路徑相符的字串時,可以對這些字串做出標記,表示這些字串是比對相符的字串。In operation 308, the processor 102 searches for the word string previously recognized from the image in operation 304. In some embodiments, processor 102 reads each string previously stored in memory 104 in operation 304 . In operation 310, the processor 102 compares the read string with the original URL. In some embodiments, the processor 102 first determines how many characters the current path of the original URL has. Here, we take the current path of the original URL as having 5 characters as an example. Next, the processor 102 compares the first 5 characters of each string to see if it matches the current path of the original URL. A matching string can be the exact same string as the current path of the original URL, or a matching string can be a string that only differs from the current path of the URL in English upper and lower case. The only difference in English case is that the letters in the string and the current path of the URL are the same, except that one is uppercase and the other is lowercase. For example, "abc", "Abc", and "AbC" all differ only in English upper and lower case, so they will be judged as matching strings by the processor 102. When the processor 102 finds strings whose first five characters match the current path of the URL, the processor 102 can mark these strings to indicate that these strings are matching strings.

在操作312中,處理器102使用比對相符的字串產生擴展網址的候選網址。在某些實施例中,處理器102將原始網址的當前路徑(最後5個字)去除,成為上層網址。之後處理器102將比對相符的字串連接到上層網址後,產生擴展網址的候選網址。換句話說,處理器102以比對相符的字串取代網址的當前路徑,產生擴展網址的候選網址。此外,在某些實施例中,處理器102也可以先將相符的字串的前5個字去除,再將剩餘的部份連接到原始網址之後,產生擴展網址的候選網址。當相符的字串與網址的當前路徑只有英文大小寫的差異時,處理器102也可以先將相符的字串全部轉換成英文大寫或英文小寫,之後再以相符的字串取代原始網址的當前路徑,或是將轉換後的字串的前5個字去除後接到原始網址後面。當比對相符的字串有多個時,處理器102會產生多個擴展網址的候選網址,並且可以將這些候選網址以清單的形式記錄下來,儲存在記憶體104中。In operation 312, the processor 102 generates candidate URLs for extending the URL using the matching strings. In some embodiments, the processor 102 removes the current path (last 5 characters) of the original URL to become the upper URL. Afterwards, the processor 102 connects the matching strings to the upper-level URL, and generates candidate URLs for the extended URL. In other words, the processor 102 replaces the current path of the URL with a matched string to generate a candidate URL for extending the URL. In addition, in some embodiments, the processor 102 may also first remove the first 5 characters of the matching string, and then connect the remaining parts to the original URL to generate a candidate URL for the extended URL. When the only difference between the matching string and the current path of the URL is English uppercase or lowercase, the processor 102 may first convert all the matching strings into English uppercase or English lowercase, and then replace the current path of the original URL with the matching string. Path, or remove the first 5 characters of the converted string and connect it to the end of the original URL. When there are multiple matching strings, the processor 102 will generate multiple candidate URLs for the extended URL, and may record these candidate URLs in the form of a list and store them in the memory 104 .

在操作314中,處理器102對所有候選網址做連接測試,將沒有通過連接測試的候選網址移出清單。處理器102可以透過網路介面106連接網際網路107,嘗試連接候選網址,並接收遠端伺服器回應的Http狀態碼。當接收到的Http狀態碼不是200 OK時,處理器102將該候選網址移出清單。如此可以保證後續提供給使用者的擴展網址都是能夠連接的。在操作316中,處理器102將清單中的候選網址作為擴展網址顯示到螢幕110上。在某些實施例中,處理器102可以同時顯示原始網址與擴展網址。In operation 314, the processor 102 performs a connection test on all candidate URLs, and removes the candidate URLs that fail the connection test from the list. The processor 102 can connect to the Internet 107 through the network interface 106, try to connect to the candidate website, and receive the HTTP status code responded by the remote server. When the received HTTP status code is not 200 OK, the processor 102 removes the candidate URL from the list. This ensures that all subsequent extension URLs provided to users can be connected. In operation 316, the processor 102 displays the candidate URLs in the list as expanded URLs on the screen 110. In some embodiments, the processor 102 can display the original URL and the expanded URL simultaneously.

參考第4圖,第4圖為本發明實施例之範例的圖片400,以下以圖片400為範例說明方法300的各個操作。在操作302中,使用者對圖片400使用OCR功能。在操作304中,處理器102從圖片400中辨識出多個字串,其中包含字串401、402、403、404,並且處理器102判斷字串404為一個原始網址(也稱為原始網址404),將原始網址404的超連結提供給使用者。在操作306中,處理器102判斷原始網址404的當前路徑是否超過預定的字數,在此範例中預定的字數為3。處理器102辨識出的原始網址404為:https://www.acer.com/ac/zh/TW/content/ces-aw,原始網址404最後一個斜線字元之後的部份(當前路徑)為「ces-aw」,因為「ces-aw」超過3個字元,處理器102繼續執行操作308。Referring to Figure 4, Figure 4 is a picture 400 of an example of an embodiment of the present invention. The following uses the picture 400 as an example to illustrate each operation of the method 300. In operation 302, the user uses the OCR function on the picture 400. In operation 304, the processor 102 identifies a plurality of word strings from the picture 400, including word strings 401, 402, 403, and 404, and the processor 102 determines that the word string 404 is an original website address (also referred to as the original website address 404 ), providing the user with the 404 hyperlink of the original URL. In operation 306, the processor 102 determines whether the current path of the original URL 404 exceeds a predetermined number of characters, which is 3 in this example. The original URL 404 recognized by the processor 102 is: https://www.acer.com/ac/zh/TW/content/ces-aw, and the part after the last slash character of the original URL 404 (the current path) is "ces-aw", because "ces-aw" exceeds 3 characters, processor 102 continues to perform operation 308.

在操作308中,處理器102讀取包含字串401、402、403的多個字串。在操作310中,處理器102將讀取到的字串與原始網址404做比對。在此以字串401、402、403為範例。由於原始網址404的最後一個斜線字元(也就是網址從右邊數來第1個斜線字元)之後具有6個字元,處理器102會判斷字串401、402、403的開頭前6個字元是否與「ces-aw」相同。字串401的內容為「Ces-Awarded-Products」,由於字串401的前6個字「Ces-Aw」與「ces-aw」僅有大小寫的差異,處理器102判斷字串401是相符的字串。字串402的內容為「Aspire」,由於字串402的前6個字「Aspire」與「ces-aw」不同,處理器102判斷字串402不是相符的字串。字串403的內容為「immersive」,由於字串403的前6個字「immers」與「ces-aw」不同,處理器102判斷字串403不是相符的字串。假設處理器102從圖片400中辨識出字串「ces-awsome」,由於字串「ces-awsome」的前6個字「ces-aw」與「ces-aw」相同,處理器102判斷字串「ces-awsome」是相符的字串。In operation 308, the processor 102 reads a plurality of word strings including the word strings 401, 402, and 403. In operation 310 , the processor 102 compares the read string with the original URL 404 . Here, the strings 401, 402, and 403 are used as examples. Since there are 6 characters after the last slash character of the original URL 404 (that is, the first slash character from the right of the URL), the processor 102 will determine the first 6 characters at the beginning of the strings 401, 402, and 403. Whether the element is the same as "ces-aw". The content of string 401 is "Ces-Awarded-Products". Since the first six characters of string 401, "Ces-Aw" and "ces-aw", are only different in upper and lower case, the processor 102 determines that string 401 is consistent. string. The content of string 402 is "Aspire". Since the first six characters of string 402, "Aspire", are different from "ces-aw", the processor 102 determines that string 402 is not a matching string. The content of string 403 is "immersive". Since the first six characters "immers" of string 403 are different from "ces-aw", the processor 102 determines that string 403 is not a matching string. Assume that the processor 102 recognizes the string "ces-awsome" from the image 400. Since the first six characters of the string "ces-awsome", "ces-aw" and "ces-aw" are the same, the processor 102 determines the string "ces-awsome" is a matching string.

在操作310中,處理器102使用之前判斷為比對相符的字串401產生原始網址404的擴展網址的候選網址。處理器102可以將字串401的前6個字去除,成為「arded-Products」。處理器102之後將「arded-Products」接在原始網址404後面,產生候選網址https://www.acer.com/ac/zh/TW/content/ces-awarded-Products。在某些實施例中,處理器102也可以將字串401全部轉換成小寫,成為「ces-awarded-products」。處理器102之後將「arded-products」接在原始網址404後面,產生候選網址https://www.acer.com/ac/zh/TW/content/ces-awarded-products。此外,假設處理器102有從圖片400中辨識出字串「ces-awsome」時,處理器102也可以使用字串「ces-awsome」以上述的方式產生候選網址https://www.acer.com/ac/zh/TW/content/ces-awsome。In operation 310 , the processor 102 generates candidate URLs for the extended URL of the original URL 404 using the strings 401 previously determined to match. The processor 102 can remove the first 6 characters of the string 401 to become "arded-Products". The processor 102 then appends "arded-Products" to the original URL 404 to generate the candidate URL https://www.acer.com/ac/zh/TW/content/ces-awarded-Products. In some embodiments, the processor 102 may also convert all the string 401 into lowercase to become "ces-awarded-products". The processor 102 then appends "arded-products" to the original URL 404 to generate the candidate URL https://www.acer.com/ac/zh/TW/content/ces-awarded-products. In addition, assuming that the processor 102 recognizes the string "ces-awsome" from the image 400, the processor 102 can also use the string "ces-awsome" to generate the candidate URL https://www.acer in the above manner. com/ac/zh/TW/content/ces-awsome.

在操作312中,處理器102透過網路介面106嘗試連接在操作310找到的候選網址,並判斷接收到的Http狀態碼是否為200 OK。假設只有候選網址https://www.acer.com/ac/zh/TW/content/ ces-awarded-products的Http狀態碼是否為200 OK,其他的候選網址的Http狀態碼則為404 not found。處理器102將其他Http狀態碼不為200 OK的候選網址移出清單。在操作316中,處理器102將原始網址404及擴展網址https://www.acer.com/ac/zh/TW/content/ ces-awarded-products皆顯示至螢幕110上。In operation 312, the processor 102 attempts to connect to the candidate URL found in operation 310 through the network interface 106, and determines whether the received HTTP status code is 200 OK. Assume that only the HTTP status code of the candidate URL https://www.acer.com/ac/zh/TW/content/ces-awarded-products is 200 OK, and the HTTP status code of other candidate URLs is 404 not found. The processor 102 removes other candidate URLs whose HTTP status code is not 200 OK from the list. In operation 316, the processor 102 displays both the original URL 404 and the extended URL https://www.acer.com/ac/zh/TW/content/ces-awarded-products on the screen 110.

參考第5圖,第5圖為本發明實施例之流程圖,示意說明使用原始網址的上層路徑產生擴展網址的方法500。方法500可以由處理器102運行記憶體104中的機器可讀指令105的方式執行。方法500的操作502與方法300的操作302相同,方法500的操作504與方法300的操作304相同,方法500的操作518與方法300的操作316相同,在此不再贅述。Referring to Figure 5, Figure 5 is a flow chart of an embodiment of the present invention, schematically illustrating a method 500 for generating an extended URL using the upper path of the original URL. Method 500 may be performed by processor 102 executing machine-readable instructions 105 in memory 104 . Operation 502 of method 500 is the same as operation 302 of method 300. Operation 504 of method 500 is the same as operation 304 of method 300. Operation 518 of method 500 is the same as operation 316 of method 300, which will not be described again here.

在操作506中,處理器102前往目前搜索過的目錄(網頁)的上一層目錄。當沒有進行過搜索時,處理器102前往原始網址的當前路徑的上一層路徑(也就是上層層級的路徑資訊)所對應的網頁。處理器102可以找到目前搜索的網頁所對應的網址(目前搜索的網址)中後面接有字元的最後一個斜線字元,將上述斜線字元後的部份去除,以此產生上一層目錄的網址。或者,處理器102可以根據原始網址,以斜線字元判斷出原始網址的所有子路徑以及根目錄路徑的資訊,並且如上所述的建構出各級子目錄網址及根目錄網址。在需要前往目前搜索的目錄的上一層目錄時,可以由目前搜索的網址找出上一層的網址,處理器102透過網路介面106連接上一層網址所對應的網頁。In operation 506, the processor 102 goes to the upper level directory of the currently searched directory (web page). When no search has been performed, the processor 102 goes to the web page corresponding to the upper-level path of the current path of the original website address (that is, the upper-level path information). The processor 102 can find the last slash character followed by a character in the URL corresponding to the currently searched web page (the currently searched URL), and remove the part after the slash character to generate a directory of the upper layer. URL. Alternatively, the processor 102 can use slash characters to determine all sub-paths and root directory path information of the original URL based on the original URL, and construct all levels of sub-directory URLs and root directory URLs as described above. When it is necessary to go to the upper level directory of the currently searched directory, the upper level URL can be found from the currently searched URL, and the processor 102 connects to the web page corresponding to the upper level URL through the network interface 106 .

在操作508中,處理器102判斷連接上一層網址所對應的網頁後,接收到的Http狀態碼是否為200 OK。如果接收到的Http狀態碼不是200 OK,處理器102執行操作512。如果接收到的Http狀態碼是200 OK,處理器102執行操作510。In operation 508, the processor 102 determines whether the HTTP status code received after connecting to the web page corresponding to the upper-layer URL is 200 OK. If the received HTTP status code is not 200 OK, processor 102 performs operation 512. If the received HTTP status code is 200 OK, the processor 102 performs operation 510.

在操作510中,處理器102搜索上一層網址所對應的網頁,將網頁內記載的所有連結記錄下來。舉例來說,處理器102可以尋找網頁的超文件標記語言(hypertext markup language (html))中的html標籤(tag),並且找到標籤中的所有的超文件參考(Hypertext Reference(href))屬性。href屬性可以指定超連結目標的URL,href 屬性的值可以是任何有效文檔的相對或絕對URL,包括片段標識符和JavaScript代碼片段。href屬性中會記載連結,處理器102將href屬性中記載的連結所對應的網址作為擴展網址的候選網址記錄下來,將擴展網址的候選網址儲存到記憶體104中。或者,href屬性中記載網址時,處理器102可以直接將網址作為擴展網址的候選網址儲存到記憶體104中。In operation 510, the processor 102 searches for the web page corresponding to the upper level URL and records all the links recorded in the web page. For example, the processor 102 can search for the html tag (tag) in the hypertext markup language (html) of the web page and find all hypertext reference (href) attributes in the tag. The href attribute can specify the URL of the hyperlink target. The value of the href attribute can be a relative or absolute URL of any valid document, including fragment identifiers and JavaScript code snippets. The link is recorded in the href attribute, and the processor 102 records the URL corresponding to the link recorded in the href attribute as a candidate URL for the extended URL, and stores the candidate URL for the expanded URL in the memory 104 . Alternatively, when the website address is recorded in the href attribute, the processor 102 may directly store the website address into the memory 104 as a candidate website address for the extended website address.

在操作512中,處理器102判斷目前搜索的網址是否為原始網址的根目錄網址。如果目前搜索的網址不是原始網址的根目錄網址,處理器102再次執行操作506。如果目前搜索的網址是原始網址的根目錄網址,處理器102執行操作514。判斷是否為根目錄網址的方式可以是處理器102根據目前搜索的網址中的斜線字元做判斷,例如當目前搜索的網址中除了「http://」中的斜線字元外,只包含1個斜線字元時(或是總共只有3個斜線字元時),處理器102判斷目前搜索的網址為原始網址的根目錄網址。或者,處理器102可以直接將目前搜索的網址與之前根據原始網址產生的根目錄網址比較,當兩者相同時處理器102判斷目前搜索的網址為原始網址的根目錄網址。In operation 512, the processor 102 determines whether the currently searched URL is the root directory URL of the original URL. If the currently searched URL is not the root directory URL of the original URL, the processor 102 performs operation 506 again. If the currently searched URL is the root directory URL of the original URL, the processor 102 performs operation 514. The way to determine whether it is a root directory URL may be that the processor 102 makes a determination based on the slash characters in the currently searched URL. For example, when the currently searched URL contains only 1 except for the slash characters in "http://". When there are three slash characters (or when there are only three slash characters in total), the processor 102 determines that the currently searched URL is the root directory URL of the original URL. Alternatively, the processor 102 may directly compare the currently searched URL with the root URL previously generated based on the original URL. When the two are the same, the processor 102 determines that the currently searched URL is the root URL of the original URL.

在操作514中,處理器102將儲存在記憶體104中的候選網址與原始網址比對,判斷比對結果是否相符。處理器102可以先判斷候選網址是否與原始網址最後一個斜線字元以前的部份完全相同,也就是判斷候選網址是否與原始網址除了當前路徑以外的部份都完全相同。處理器102可以在判斷候選網址與原始網址除了當前路徑以外的部份都完全相同後,判斷候選網址與原始網址當前路徑的部份是否開頭的M個字元相同。M例如為3、4、5、10,或是任何預先決定的合適的數量。若是相同則處理器102判斷候選網址與原始網址相符。或者,處理器102可以判斷候選網址中是否包含原始網址。處理器102可以判斷候選網址的前N個字元是否與原始網址相同,N為原始網址的字元數。此外,處理器102也可以嘗試連接候選網址,判斷接收的Http狀態碼是否為200 OK。在操作516中,處理器將比對結果為相符的候選網址加入清單中,並將清單儲存於記憶體104中。In operation 514, the processor 102 compares the candidate URL stored in the memory 104 with the original URL to determine whether the comparison results match. The processor 102 may first determine whether the candidate URL is exactly the same as the part before the last slash character of the original URL, that is, whether the candidate URL is identical to the original URL except for the current path. After determining that the candidate URL and the original URL are identical except for the current path, the processor 102 may determine whether the first M characters of the current path part of the candidate URL and the original URL are the same. M is, for example, 3, 4, 5, 10, or any predetermined suitable number. If they are the same, the processor 102 determines that the candidate URL is consistent with the original URL. Alternatively, the processor 102 may determine whether the candidate URL contains the original URL. The processor 102 may determine whether the first N characters of the candidate URL are the same as the original URL, where N is the number of characters of the original URL. In addition, the processor 102 may also try to connect to the candidate website and determine whether the received HTTP status code is 200 OK. In operation 516 , the processor adds the candidate URLs with matching results to the list, and stores the list in the memory 104 .

方法500是原始網址的各層子目錄網址及根目錄網址對應的網頁中的連結記錄下來。這種作法是有選擇性的紀錄連結,而不會將所有從根目錄網址對應的網頁能直接或間接連接到的連結都記錄下來。如此一來,可以節省搜尋的時間,並且較容易找到與原始網址相關的擴展網址。Method 500 is to record the links in the web pages corresponding to the subdirectory URLs at each level of the original URL and the root directory URL. This approach is to selectively record links without recording all links that can be directly or indirectly connected to the web page corresponding to the root directory URL. This saves search time and makes it easier to find the expanded URL that's related to the original URL.

參考第6圖,第6圖為本發明實施例之範例的圖片600,以下以圖片600為範例說明方法500的各個操作。在操作502中,使用者對圖片600使用OCR功能。在操作504中,處理器102從圖片500中辨識出多個字串,其中包含字串601,並且處理器102判斷字串601為一個原始網址(也稱為原始網址601),將原始網址601的超連結提供給使用者。在操作506中,處理器102前往原始網址601的上一層網址所對應的網頁。原始網址601的內容為https://www.cdc.gov.tw/Disease/SubIndex/WYbKe3aE7LiY5gb,處理器102可以將原始網址601當前路徑的部份去掉,意即去掉「WYbKe3aE7LiY5gb」,以產生上一層的子目錄網址https://www.cdc.gov.tw/Disease/SubIndex/,並且前往上一層子目錄網址對應的網頁。在操作508中,處理器102判斷從上一層子目錄網址對應的網頁所接收的Http狀態碼是否為200 OK。當接收的Http狀態碼為200 OK,處理器102在操作510中將https://www.cdc.gov.tw/Disease/SubIndex/對應的網頁中所記載的連結記錄下來,並進行操作512。在操作512中,由於https://www.cdc.gov.tw/Disease/SubIndex/中除了「http://」中的斜線字元外還包含多個斜線字元,處理器102判斷目前搜索的網址不是原始網址601的根目錄網址,並執行操作506。在操作506中,處理器102找到目前搜索的網址https://www.cdc.gov.tw/Disease/SubIndex/的最後一個後面有字串的斜線字元,並將上述斜線字元之後的部份去掉,產生上一層的子目錄網址https://www.cdc.gov.tw/Disease/,並前往其對應的網頁。重覆執行操作506~512,直到原始網址601的根目錄網址https://www.cdc.gov.tw/對應的網頁被搜索過為止。Referring to Figure 6, Figure 6 is a picture 600 of an example of an embodiment of the present invention. The following uses the picture 600 as an example to describe each operation of the method 500. In operation 502, the user uses the OCR function on the image 600. In operation 504, the processor 102 identifies multiple character strings from the picture 500, including the character string 601, and the processor 102 determines that the character string 601 is an original URL (also called the original URL 601), and converts the original URL 601 hyperlinks are provided to users. In operation 506, the processor 102 goes to the web page corresponding to the upper level URL of the original URL 601. The content of the original URL 601 is https://www.cdc.gov.tw/Disease/SubIndex/WYbKe3aE7LiY5gb. The processor 102 can remove part of the current path of the original URL 601, that is, remove "WYbKe3aE7LiY5gb" to generate the previous layer. The subdirectory URL https://www.cdc.gov.tw/Disease/SubIndex/, and go to the web page corresponding to the previous subdirectory URL. In operation 508, the processor 102 determines whether the HTTP status code received from the web page corresponding to the upper-level subdirectory URL is 200 OK. When the received HTTP status code is 200 OK, the processor 102 records the link recorded in the web page corresponding to https://www.cdc.gov.tw/Disease/SubIndex/ in operation 510, and performs operation 512. In operation 512, since https://www.cdc.gov.tw/Disease/SubIndex/ contains multiple slash characters in addition to the slash character in "http://", the processor 102 determines that the current search The URL is not the root URL of the original URL 601 and action 506 is performed. In operation 506, the processor 102 finds the last slash character of the currently searched URL https://www.cdc.gov.tw/Disease/SubIndex/ followed by a string, and converts the part after the slash character Remove the copy, generate the upper-level subdirectory URL https://www.cdc.gov.tw/Disease/, and go to its corresponding web page. Repeat operations 506 to 512 until the web page corresponding to the root directory URL https://www.cdc.gov.tw/ of the original URL 601 has been searched.

或者,處理器102在操作506中可以先判對出將原始網址601的各層子路徑及根目錄路徑,分別為當前路徑WYbKe3aE7LiY5gb、二級子路徑SubIndex、一級子路徑Disease,以及根目錄路徑https://www.cdc.gov.tw。並如上所述的建構出原始網址601的各層子目錄網址及根目錄網址,分別為二級子目錄網址https://www.cdc.gov.tw/Disease/SubIndex/、一級子目錄網址https://www.cdc.gov.tw/Disease/,以及根目錄網址https://www.cdc.gov.tw。處理器102在之後重覆執行操作506~512,逐層搜尋所有子目錄網址及根目錄網址對應的網頁,並且在操作512中可以直接將目前搜索的網址與根目錄網址比較。Alternatively, in operation 506, the processor 102 may first determine the sub-paths at each level and the root directory path of the original website address 601, which are respectively the current path WYbKe3aE7LiY5gb, the second-level sub-path SubIndex, the first-level sub-path Disease, and the root directory path https: //www.cdc.gov.tw. And construct the subdirectory URLs of each layer and the root directory URL of the original URL 601 as mentioned above, which are the second-level subdirectory URL https://www.cdc.gov.tw/Disease/SubIndex/ and the first-level subdirectory URL https: //www.cdc.gov.tw/Disease/, and the root directory URL https://www.cdc.gov.tw. The processor 102 then repeatedly performs operations 506 to 512 to search all sub-directory URLs and web pages corresponding to the root directory URL layer by layer, and in operation 512, the currently searched URL can be directly compared with the root directory URL.

在操作514中,當記錄下的候選網址之一者為https://www.cdc.gov.tw/Disease/SubIndex/WYbKe3aE7LiY5gb1,由於候選網址的前55個字元與原始網址601相同,也就是候選網址包含原始網址601。處理器102判斷這個候選網址與原始網址601比對結果是相符的。同樣地,處理器102也會判斷候選網址https://www.cdc.gov.tw/Disease/SubIndex/WYbKe3aE7LiY5gb12及候選網址https://www.cdc.gov.tw/Disease/SubIndex/WYbKe3aE7LiY5gb-aa與原始網址601相符。在操作516中,處理器102將判斷與原始網址601相符的候選網址加入清單,並在操作518中將清單中的網址作為原始網址601的擴展網址顯示在螢幕110上。In operation 514, when one of the recorded candidate URLs is https://www.cdc.gov.tw/Disease/SubIndex/WYbKe3aE7LiY5gb1, since the first 55 characters of the candidate URL are the same as the original URL 601, that is The candidate URL contains the original URL 601. The processor 102 determines that the comparison result between the candidate URL and the original URL 601 is consistent. Similarly, the processor 102 will also determine the candidate URL https://www.cdc.gov.tw/Disease/SubIndex/WYbKe3aE7LiY5gb12 and the candidate URL https://www.cdc.gov.tw/Disease/SubIndex/WYbKe3aE7LiY5gb-aa Matches the original URL 601. In operation 516 , the processor 102 adds the candidate URLs determined to be consistent with the original URL 601 to the list, and in operation 518 displays the URLs in the list as extended URLs of the original URL 601 on the screen 110 .

參考第7圖,第7圖為本發明實施例之流程圖,示意說明使用圖片中的字串且/或原始網址的上層路徑產生擴展網址的方法700。方法700可以由處理器102運行記憶體104中的機器可讀指令105的方式執行。方法700的操作702與方法300的操作302及方法500的操作502相同,方法700的操作704與方法300的操作304及方法500的操作504相同。方法700的操作706~714與方法300的操作306~314相同。方法700的操作716~726與方法500的操作506~516相同,在此不再贅述。當在操作706中判斷原始網址的最後一段的字元數超過預定的字數時,處理器102執行操作706~714,否則處理器102不執行操作706~714,處理器102只執行操作716~726以找出擴展網址。Referring to Figure 7, Figure 7 is a flow chart of an embodiment of the present invention, schematically illustrating a method 700 of generating an extended URL using the string in the image and/or the upper path of the original URL. Method 700 may be performed by processor 102 executing machine-readable instructions 105 in memory 104 . Operation 702 of method 700 is the same as operation 302 of method 300 and operation 502 of method 500. Operation 704 of method 700 is the same as operation 304 of method 300 and operation 504 of method 500. Operations 706-714 of method 700 are the same as operations 306-314 of method 300. Operations 716 to 726 of method 700 are the same as operations 506 to 516 of method 500, and will not be described again. When it is determined in operation 706 that the number of characters in the last paragraph of the original URL exceeds the predetermined number of characters, the processor 102 performs operations 706 to 714. Otherwise, the processor 102 does not perform operations 706 to 714, and the processor 102 only performs operations 716 to 714. 726 to find out the extension URL.

在操作728中,清單中可能會有在操作706~714中使用圖片中的字串產生的擴展網址,以及在操作716~726中使用原始網址的上層路徑產生的擴展網址。處理器102可以先將清單中重複的網址去除後,再將清單中的網址作為原始網址的擴展網址顯示在螢幕110上。若是在步驟706中處理器102判斷原始網址的當前路徑的字元數沒有超過預定的字數,此時操作708~714不會執行,故清單中只有在操作716~726中使用原始網址的上層路徑產生的擴展網址。In operation 728, the list may include the expanded URL generated by using the string in the image in operations 706~714, and the expanded URL generated by using the upper path of the original URL in operations 716~726. The processor 102 may first remove duplicate URLs from the list, and then display the URLs in the list as expanded URLs of the original URLs on the screen 110 . If in step 706 the processor 102 determines that the number of characters in the current path of the original URL does not exceed the predetermined number of characters, operations 708 to 714 will not be executed at this time, so only the upper layer of the original URL is used in operations 716 to 726 in the list. The expanded URL generated by the path.

本發明實施例的方法可以自動找出網址的擴展網址,將網址缺漏的部份補齊,提供更好的使用者體驗。本發明實施例的方法可以使用不同方式找出擴展網址,使得找出正確擴展網址的機會增加。此外,使用不同方式找出擴展網址可以使本發明實施例的方法能夠適用於更多情況,例如搜尋上層目錄連結的方式在網址為雜湊碼(hash code)時也適用。The method of the embodiment of the present invention can automatically find the extended URL of the URL, fill in the missing parts of the URL, and provide a better user experience. The method of the embodiment of the present invention can use different methods to find the extended URL, so as to increase the chance of finding the correct extended URL. In addition, using different methods to find the extended URL can make the method of the embodiment of the present invention applicable to more situations. For example, the method of searching for upper-level directory links is also applicable when the URL is a hash code.

在以上實施例中使用一系列的操作或事件進行描述或說明,然而應該理解操作或事件說明的順序不應該用以做出限制。舉例來說,一些操作可以以不同的順序發生、除去在此說明的其他操作或事件,或是不同的操作可以同時發生。此外,在此描述的一或多個操作可以以一或多個分開的操作及/或階段執行。In the above embodiments, a series of operations or events are used for description or illustration. However, it should be understood that the order of operations or event descriptions should not be used to make limitations. For example, some operations may occur in a different order than other operations or events described herein, or different operations may occur simultaneously. Additionally, one or more operations described herein may be performed in one or more separate operations and/or stages.

本發明雖以較佳實施例揭露如上,然其並非用以限定本發明的範圍,任何熟習此項技藝者,在不脫離本發明之精神和範圍內,當可做些許的更動與潤飾,因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。Although the present invention is disclosed above in terms of preferred embodiments, they are not intended to limit the scope of the present invention. Anyone skilled in the art can make slight changes and modifications without departing from the spirit and scope of the present invention. Therefore, The protection scope of the present invention shall be determined by the appended patent application scope.

100:電子裝置 102:處理器 104:記憶體 105:機器可讀指令 106:網路介面 107:網際網路 108:攝影機 110:螢幕 200, 300, 500, 700:方法 202, 204, 206, 302, 304, 306, 308, 310, 312, 314, 316, 502, 504, 506, 508, 510, 512, 514, 516, 518, 702, 704, 706, 708、710, 712, 714, 716, 718, 720, 722, 724, 726, 728:操作 400, 600:圖片 401, 402, 403:字串 404, 601:字串/原始網址 100: Electronic devices 102: Processor 104:Memory 105: Machine readable instructions 106:Network interface 107:Internet 108:Camera 110:Screen 200, 300, 500, 700:Method 202, 204, 206, 302, 304, 306, 308, 310, 312, 314, 316, 502, 504, 506, 508, 510, 512, 514, 516, 518, 702, 704, 706, 708, 71 0, 712, 714, 716, 718, 720, 722, 724, 726, 728: Operation 400, 600: Pictures 401, 402, 403: string 404, 601: string/original URL

第1圖為本發明實施例之電子裝置的方塊圖。 第2圖為本發明實施例之流程圖,示意說明用以產生擴展網址的方法。 第3圖為本發明實施例之流程圖,示意說明使用圖片中的字串產生擴展網址的方法。 第4圖為本發明實施例之範例的圖片。 第5圖為本發明實施例之流程圖,示意說明使用網址的上層路徑產生擴展網址的方法。 第6圖為本發明實施例之範例的圖片。 第7圖為本發明實施例之流程圖,示意說明使用圖片中的字串且/或網址的上層路徑產生擴展網址的方法。 Figure 1 is a block diagram of an electronic device according to an embodiment of the present invention. Figure 2 is a flow chart of an embodiment of the present invention, schematically illustrating a method for generating an extended URL. Figure 3 is a flow chart of an embodiment of the present invention, schematically illustrating a method of generating an extended URL using word strings in pictures. Figure 4 is a picture of an example of an embodiment of the present invention. Figure 5 is a flow chart of an embodiment of the present invention, schematically illustrating a method of generating an extended URL using the upper path of the URL. Figure 6 is a picture of an example of an embodiment of the present invention. Figure 7 is a flow chart of an embodiment of the present invention, schematically illustrating a method of generating an extended URL using the string in the image and/or the upper path of the URL.

700:方法 700:Method

702、704、706、708、710、712、714、716、718、720、722、724、726、728:操作 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728: Operation

Claims (8)

一種產生擴展網址的方法,包含:使用一處理器利用光學字元辨識技術,從一圖片中辨識出至少一字串及一原始網址,其中上述原始網址包含一當前層級的路徑資訊以及上述當前層級之一上層層級的路徑資訊,上述上層層級的路徑資訊對應一目錄及一目錄網頁;以及上述處理器使用上述圖片中所辨識出的上述字串或上述目錄網頁中所記載的至少一連結,產生上述原始網址所相關的至少一個擴展網址;其中產生上述擴展網址的操作,包含:判斷上述當前層級的路徑資訊是否超過一預定字數;回應於判斷上述當前層級的路徑資訊超過上述預定字數時,判斷上述字串的前n個字元是否與上述當前層級的路徑資訊相符,n為上述當前層級的路徑資訊的字元數;以及回應於上述字串的前n個字元與上述當前層級的路徑資訊相符,將上述字串的前n個字元去除並接到上述原始網址的上述當前層級的路徑資訊後,產生上述擴展網址的至少一個候選網址。 A method for generating an extended URL, including: using a processor to use optical character recognition technology to identify at least one string and an original URL from an image, wherein the original URL includes path information of a current level and the current level. An upper-level path information, the above-mentioned upper-level path information corresponds to a directory and a directory webpage; and the above-mentioned processor uses the above-mentioned string recognized in the above-mentioned picture or at least one link recorded in the above-mentioned directory webpage to generate At least one extended URL related to the above-mentioned original URL; the operation of generating the above-mentioned extended URL includes: judging whether the path information of the above-mentioned current level exceeds a predetermined number of words; responding to judging whether the path information of the above-mentioned current level exceeds the above-mentioned predetermined number of words. , determine whether the first n characters of the above string are consistent with the path information of the above current level, n is the number of characters of the path information of the above current level; and respond to whether the first n characters of the above string are consistent with the above current level. The path information matches the path information of the string. After removing the first n characters of the string and connecting it to the path information of the current level of the original URL, at least one candidate URL of the extended URL is generated. 如請求項1之產生擴展網址的方法,其中產生上述擴展網址的操作更包含:判斷上述候選網址所對應的網頁之Http狀態碼是否為200 OK;以及回應於上述Http狀態碼為200 OK,上述處理器以Http狀態碼為 200 OK的上述候選網址作為上述擴展網址。 For example, the method for generating an extended URL in request item 1, wherein the operation of generating the above-mentioned extended URL further includes: determining whether the HTTP status code of the webpage corresponding to the above-mentioned candidate URL is 200 OK; and responding to the above-mentioned Http status code being 200 OK, the above-mentioned The handler uses the HTTP status code as The above candidate URL of 200 OK is used as the above extension URL. 如請求項1之產生擴展網址的方法,其中產生上述擴展網址的操作更包含:判斷上述目錄網頁的Http狀態碼是否為200 OK;以及回應於上述目錄網頁的Http狀態碼為200 OK,上述處理器使用上述目錄網頁中記載的上述連結產生上述擴展網址。 For example, the method for generating an extended URL in request item 1, the operation of generating the above-mentioned extended URL further includes: determining whether the HTTP status code of the above-mentioned directory webpage is 200 OK; and responding to the HTTP status code of the above-mentioned directory webpage is 200 OK, and the above processing The server generates the above-mentioned extended URL using the above-mentioned link recorded in the above-mentioned directory web page. 如請求項1之產生擴展網址的方法,其中產生上述擴展網址的操作,包含:將上述目錄網頁中所記載的所有上述連結所對應的網址作為上述擴展網址的上述候選網址。 For example, the method for generating an extended URL in request item 1, wherein the operation of generating the above-mentioned extended URL includes: using the URLs corresponding to all the above-mentioned links recorded in the above-mentioned directory web page as the above-mentioned candidate URLs for the above-mentioned extended URL. 如請求項4之產生擴展網址的方法,其中上述處理器將包含上述原始網址的上述候選網址作為上述擴展網址。 For example, in the method of generating an extended URL in request item 4, the processor uses the candidate URL including the original URL as the extended URL. 如請求項1之產生擴展網址的方法,更包括將上述擴展網址顯示至一螢幕上。 For example, the method of generating an extended URL in request item 1 further includes displaying the above extended URL on a screen. 如請求項1之產生擴展網址的方法,其中上述當前層級的路徑資訊為上述原始網址最後一個斜線字元之後的部份。 For example, request item 1 provides a method for generating an extended URL, wherein the path information of the current level is the part after the last slash character of the original URL. 一種電子裝置,包含:一處理器;一記憶體,其中包含由上述處理器讀取並執行的一或多個機器可讀指令;以及一網路介面,將上述電子裝置連接至網路;其中,當上述機器可讀指令由上述處理器讀取並執行時,致使上 述處理器運行如請求項1至7之任一者所述的方法。 An electronic device includes: a processor; a memory including one or more machine-readable instructions read and executed by the processor; and a network interface connecting the electronic device to a network; wherein , when the above machine-readable instructions are read and executed by the above-mentioned processor, resulting in the above The processor executes a method as described in any one of claims 1 to 7.
TW111119792A 2022-05-27 2022-05-27 Method for produce extended web address and electronic device TWI812243B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW111119792A TWI812243B (en) 2022-05-27 2022-05-27 Method for produce extended web address and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW111119792A TWI812243B (en) 2022-05-27 2022-05-27 Method for produce extended web address and electronic device

Publications (2)

Publication Number Publication Date
TWI812243B true TWI812243B (en) 2023-08-11
TW202347142A TW202347142A (en) 2023-12-01

Family

ID=88585663

Family Applications (1)

Application Number Title Priority Date Filing Date
TW111119792A TWI812243B (en) 2022-05-27 2022-05-27 Method for produce extended web address and electronic device

Country Status (1)

Country Link
TW (1) TWI812243B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110107193A1 (en) * 2000-02-22 2011-05-05 Sony Corporation Method of replacing content

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110107193A1 (en) * 2000-02-22 2011-05-05 Sony Corporation Method of replacing content

Also Published As

Publication number Publication date
TW202347142A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
JP5069285B2 (en) Propagating useful information between related web pages, such as web pages on a website
US20090089278A1 (en) Techniques for keyword extraction from urls using statistical analysis
US7536445B2 (en) Enabling a web-crawling robot to collect information from web sites that tailor information content to the capabilities of accessing devices
US9521161B2 (en) Method and apparatus for detecting computer fraud
KR102274561B1 (en) Transaction system error detection method, apparatus, storage medium and computer device
US8438279B2 (en) Identifying content that is responsive to a request for an invalid URL
CN108900554B (en) HTTP asset detection method, system, device and computer medium
CN103810268B (en) Search result recommendation information loading method, device and system and URL detection method, device and system
WO2015109928A1 (en) Method, device and system for loading recommendation information and detecting url
US7865821B2 (en) Electronic document update notification device and electronic document update notifying method
CN108491715A (en) Generation method, device and the server in Terminal fingerprints library
CN103793508B (en) A kind of loading recommendation information, the methods, devices and systems of network address detection
CN109547294A (en) Networking equipment model detection method and device based on firmware analysis
JPWO2020044469A1 (en) Rogue Web Page Detection Device, Control Method and Control Program for Rogue Web Page Detection Device
TWI812243B (en) Method for produce extended web address and electronic device
WO2015074455A1 (en) Method and apparatus for computing url pattern of associated webpage
US11429688B2 (en) Correcting a URL within a REST API call
CN117540374A (en) File scanning method and device
CN106959975B (en) Transcoding resource cache processing method, device and equipment
JP2011186639A (en) Content relation management system, content relation management device, content relation management method and program
JP2011209886A (en) Method, program, and device for annotation
US20240037214A1 (en) Information processing device, information processing method, and computer readable medium
JP2009223538A (en) Information providing method, information providing device, information providing system, and computer program
JP2014089692A (en) Information providing server
CN112528117B (en) Recognition method and related device for government affair website primary catalog