KR20010082966A

KR20010082966A - Method and system for providing related web sites for the current visitting of client

Info

Publication number: KR20010082966A
Application number: KR1020000008551A
Authority: KR
Inventors: 백윤주; 백인혁
Original assignee: 백윤주; 주식회사 원큐
Priority date: 2000-02-22
Filing date: 2000-02-22
Publication date: 2001-08-31
Also published as: KR100371805B1

Abstract

PURPOSE: A method and system for providing a related web site is provided to offer a related list to a user in real time by extracting web sites closely related to a web site the user visits in current while the user uses the Internet. CONSTITUTION: At least one standard URL is created by normalizing a URL of each web site from an Internet bookmark DB(100). At least one related URL is created by extracting a relation between web sites from the standard URL(200). At least one related URL headword is created by visiting each URL through a server agent(300). Web sites closely related to a web site the user visits to in current are searched from the created related URL and the created related URL headword and are provided to a user in real time.

Description

Method and system for providing related web sites for the current visitting of client

본 발명은 관련 웹 사이트 제공 방법 및 이를 수행하기 위한 시스템에 관한 것으로, 보다 상세하게는 인터넷 북 마크 DB를 이용하여 관련 웹 사이트를 추출하여 사용자측에 제공하는 방법 및 이를 수행하기 위한 시스템에 관한 것이다.The present invention relates to a method for providing a related web site and a system for performing the same, and more particularly, to a method for extracting a related web site using an internet book DB and providing the same to a user side, and a system for performing the same.

1980년 초까지만 해도 주로 연구소용으로 소규모의 전문가 그룹에 의해 활용되던 인터넷은 WWW(World Wide Web)의 출현에 힘입어 그 용도가 일반 상업용으로 확대되어 폭발적인 사용 증가를 보이고 있다. 1999년 11월말 기준으로 미국의 인터넷 사용자는 1억 1천만 여명으로 전세계 인터넷 인구 2억 5900만 여명 가운데 43%를 차지하고 있고, 한국의 인터넷 사용자는 568만 8천명으로 세계 10위를 차지하고 있다(출처 : 얼머넥(Almanac)사, 미국). 국내 인구수를 고려해볼 때 2002년에는 국민의 절반 이상이 인터넷을 사용하게 될 것으로 예상된다.Until the early 1980s, the Internet, which was used primarily by small groups of specialists for research institutes, has been exploding in use as the world wide web (WWW) has expanded its use to general commercial use. As of the end of November 1999, the US Internet users were 110 million, accounting for 43% of the world's 259 million Internet users, and Korea's 10,580,000 Internet users were ranked 10th in the world. : Almanac, USA. Considering the number of people in Korea, it is expected that in 2002 more than half of the population will use the Internet.

인터넷에 억세스하길 원하는 개개의 PC 사용자는 통상적으로 웹 브라우저로알려진 어플리케이션(Application) 소프트웨어를 이용하여 억세스한다. 웹 브라우저는 인터넷을 통해 웹 서버로 알려진 다른 컴퓨터로의 접속을 형성하고, 이용자의 PC상에 디스플레이되는 정보를 웹 서버로부터 수신한다. 웹 서버로부터 웹 브라우저로 전송되는 정보는 일반적인 하이퍼 텍스트 마크 업 언어(Hyper Text Markup Language : HTML)로 불리는 특수 언어를 이용하여 포맷되며, 통상적으로 웹 페이지로 알려진 페이퍼로 편성된다.Individual PC users who wish to access the Internet typically use application software known as web browsers. The web browser establishes a connection to another computer known as a web server via the Internet, and receives information displayed on the user's PC from the web server. The information sent from the web server to the web browser is formatted using a special language called a general Hyper Text Markup Language (HTML), and is organized into papers, commonly known as web pages.

한편 인터넷 사용자의 관심은 뉴스 및 미디어, 엔터테인먼트, 금융, 쇼핑, 과학 기술, 문화 등 전 분야에 걸쳐 제공되고 있는 수많은 인터넷 사이트 중에서 자신에게 맞는 사이트를 빨리 찾아내어 해당 서비스를 이용하는 것이다. 이러한 사용자의 욕구를 충족시키기 위한 대표적인 방법으로 야후(Yahoo)나 라이코스(Lycos)같은 인터넷 검색 업체의 서비스를 이용하는 방법이 있다.On the other hand, Internet users' interest is to quickly find a site that is suitable for them and use the service among a large number of Internet sites provided in news, media, entertainment, finance, shopping, technology, and culture. A representative way to satisfy the needs of users is to use services of Internet search companies such as Yahoo and Lycos.

이들 검색 업체는 수동 또는 자동화된 검색 로봇을 이용하여 인터넷 사이트의 자료를 찾아 분류하여 DB화하고, 사용자가 검색하고자하는 단어를 입력하면 구축된 DB를 검색하여 입력된 단어와 매칭되는 사이트 정보를 제공하거나 사용자에게 검색 업체가 구축한 분류 트리를 제시하여 사용자 자신이 분류 트리를 따라 내려가며 자신이 원하는 사이트를 찾아내는 검색 방식을 제공하고 있다.These search companies use manual or automated search robots to find and classify the data on the Internet site and make it into a database. When the user enters a word to search, the searched DB is searched to provide the site information that matches the entered word. Or, by presenting the classification tree constructed by the search company to the user, the user himself searches down the classification tree and finds the site he / she wants.

직접적인 인터넷 검색 업체 이외에 최근 등장하여 활발히 이용되고 있는 허브 사이트 방식을 이용하여 원하는 인터넷 사이트에의 접근도 가능하다. 이러한 허브 사이트는 관련있는 웹 사이트들을 한 웹 사이트에서 안내하는 사이트이다.In addition to direct Internet search companies, it is also possible to access desired Internet sites by using a hub site method that has recently been actively used. These hub sites are sites that direct related web sites from one web site.

사용자는 검색 서비스를 이용하거나, 다른 경로를 통해 찾아낸 자신이 즐겨방문하는 인터넷 사이트를 자신의 로컬 디스크 내에 북마크 형태로 저장하여 나중에 손쉽게 해당 사이트를 방문할 수 있다.The user can easily visit the site later by using a search service or storing his favorite Internet site found in another path in the form of a bookmark in his local disk.

이에 본 발명의 기술과 과제는 이러한 점에 착안한 것으로, 본 발명의 목적은 사용자가 인터넷을 항해하는 도중에 사용자가 현재 방문하고 있는 웹 사이트와 관련성이 높은 웹 사이트들을 추출하여 사용자에게 실시간으로 관련 리스트를 제공하기 위한 관련 웹 사이트 제공 방법을 제공하는 것이다.Accordingly, the present invention has been made in view of the above-mentioned problems, and an object of the present invention is to extract a web site highly relevant to the web site that the user is currently visiting while the user is navigating the Internet, and to provide the user with a related list in real time. It is to provide a method for providing a related website for providing.

본 발명의 다른 목적은 상기한 관련 웹 사이트 제공 방법을 수행하기 위한 시스템을 제공하는 것이다.Another object of the present invention is to provide a system for performing the above-described related web site providing method.

도 1은 본 발명의 실시예에 따른 관련 웹 사이트 추출 시스템을 설명하기 위한 도면이다.1 is a view for explaining a related web site extraction system according to an embodiment of the present invention.

도 2는 본 발명의 실시예에 따른 관련 웹 사이트 제공 방법을 설명하기 위한 흐름도이다.2 is a flowchart illustrating a related web site providing method according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 따른 URL 표준화 과정을 설명하기 위한 흐름도이다.3 is a flowchart illustrating a URL standardization process according to an embodiment of the present invention.

도 4는 본 발명의 실시예에 따라 관련 URL을 추출하는 과정을 설명하기 위한 흐름도이다.4 is a flowchart illustrating a process of extracting a related URL according to an embodiment of the present invention.

도 5는 본 발명의 실시예에 따라 URL 표제어를 추출하는 과정을 설명하기 위한 흐름도이다.5 is a flowchart illustrating a process of extracting a URL header according to an embodiment of the present invention.

도 6은 본 발명의 실시예에 따라 동일 URL을 판별하는 과정을 설명하기 위한 흐름도이다.6 is a flowchart illustrating a process of determining the same URL according to an embodiment of the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100 : 북 마크 DB 200 : URL 표준화부100: book mark DB 200: URL standardization unit

210 : 판별 대상 DB 220 : 동일 URL 판별부210: determination target DB 220: the same URL determination unit

230 : 중복 도메인 DB 240 : URL 표준화부230: duplicate domain DB 240: URL normalization unit

250 : 표준 URL DB 300 : 관련 URL 추출부250: standard URL DB 300: related URL extraction unit

310 : URL 관련 추출부 320 : 관련 URL DB310: URL related extraction unit 320: related URL DB

400 : URL 표제어 구축부 410 : URL 표제어 추출부400: URL header construction unit 410: URL header extraction unit

420 : URL 표제어 DB 500 : 서비스 제공부420: URL header DB 500: service provider

510 : 웹 서버510: web server

상기한 본 발명의 목적을 실현하기 위한 하나의 특징에 따른 관련 웹 사이트 제공 방법은,Related website providing method according to one feature for realizing the above object of the present invention,

(a) 인터넷 북마크 DB로부터 각 웹 사이트의 URL을 정규화하여 하나 이상의 표준 URL을 생성하는 단계;(a) normalizing the URL of each web site from an internet bookmark DB to generate one or more standard URLs;

(b) 상기 표준 URL로부터 각 웹 사이트간의 관련성을 추출하여 하나 이상의 관련 URL을 생성하는 단계;(b) generating one or more related URLs by extracting associations between respective web sites from the standard URLs;

(c) 서버 에이전트를 통하여 각 URL을 방문하여 하나 이상의 관련 URL 표제어를 생성하는 단계; 및(c) visiting each URL through a server agent to generate one or more related URL headings; And

(d) 사용자가 현재 방문하고 있는 인터넷 웹 사이트와 관련성이 높은 사이트들을 상기 단계(b)에서 생성한 하나 이상의 관련 URL과 상기 단계(c)에서 생성한하나 이상의 관련 URL 표제어로부터 검색하여 상기 사용자에게 실시간으로 제공하는 단계를 포함한다.(d) search for sites that are highly relevant to the Internet web site the user is currently visiting, from one or more related URLs generated in step (b) and one or more related URL headings generated in step (c). Providing in real time.

또한 상기한 본 발명의 다른 목적을 실현하기 위한 하나의 특징에 따른 관련 웹 사이트 추출 시스템은,In addition, the related website extraction system according to one feature for realizing another object of the present invention described above,

하나 이상의 웹 사이트의 URL를 저장하는 인터넷 북 마크 DB;An Internet book DB that stores URLs of one or more web sites;

상기 인터넷 북마크 DB에 저장된 URL을 정규화하여 표준 URL DB를 구축하는 URL 표준화부;A URL normalization unit for constructing a standard URL DB by normalizing URLs stored in the Internet bookmark DB;

상기 표준 URL DB로부터 각 웹 사이트간의 관련성을 추출하여 관련 URL DB를 구축하는 관련 URL 추출부;A related URL extracting unit for constructing a related URL DB by extracting a relation between respective web sites from the standard URL DB;

서버 에이전트를 통하여 각 URL을 방문하여 관련 URL의 표제어를 추출하며, 관련 URL 표제어 DB를 구축하는 URL 표제어 구축부; 및A URL heading construction unit for extracting headings of related URLs by visiting each URL through a server agent, and constructing a related URL heading DB; And

상기 관련 URL DB와 상기 URL 표제어 DB에 저장된 정보를 근거로 사용자가 현재 방문하고 있는 인터넷 웹 사이트와 관련성이 높은 하나 이상의 사이트를 상기 사용자에게 제공하는 서비스 제공부를 포함한다.And a service providing unit for providing the user with at least one site highly related to the Internet web site the user is currently visiting based on the information stored in the related URL DB and the URL heading DB.

이러한 관련 웹 사이트 제공 방법 및 이를 수행하기 위한 시스템에 의하면, 사용자가 현재 방문중인 인터넷상의 웹 사이트와 관련성이 높은 웹 사이트들을 추출하여 해당 사용자에게 실시간으로 관련 리스트를 제공할 수 있다.According to such a method for providing a related web site and a system for performing the same, it is possible to extract a web site highly relevant to a web site on the Internet which the user is currently visiting and provide a related list to the user in real time.

그러면, 통상의 지식을 지닌 자가 본 발명을 용이하게 실시할 수 있도록 실시예에 관해 설명하기로 한다.Then, embodiments will be described so that those skilled in the art can easily implement the present invention.

도 1에 도시한 바와 같이, 본 발명의 실시예에 따른 관련 웹 사이트 추출 시스템은 북 마크 DB(100), URL 표준화부(200), 관련 URL 추출부(300), URL 표제어 구축부(400), 서비스 제공부(500)를 포함한다.As shown in FIG. 1, the related web site extraction system according to an embodiment of the present invention includes a book mark DB 100, a URL standardization unit 200, an associated URL extraction unit 300, and a URL heading construction unit 400. And a service provider 500.

북 마크 DB(100)는 인터넷 사용자들의 북마크 정보로써, 가입자들은 자신의 북마크를 인터넷상에서 온라인으로 억세스하기 위해 자신의 북마크, 바람직하게는 유.알.엘(Uniform Resource Locator; 이하 URL이라 칭함.) 주소를 저장한다.The book mark DB 100 is bookmark information of Internet users, and subscribers use their bookmarks, preferably a Uniform Resource Locator (URL), to access their bookmarks online on the Internet. Save the address.

이때 URL은 웹 서버가 인터넷상에 존재하는 어떤 특정 정보나 파일, 자원을 검색하고 해석하는데 필요한 네트워크 서비스와의 인터넷상의 어떠한 파일이나 서비스도 표현 및 데이터를 직접 받아올 수 있다.In this case, the URL can directly receive the representation and data of any file or service on the Internet with the network service necessary for the web server to search and interpret any specific information, file or resource existing on the Internet.

예를 들어, 'http://www.trumpet.com.ar/'라고 표현되었을 때 'http'는 프로토콜을 의미하고, 'www.trumpet.com.ar'은 접속하려고 하는 곳을 의미한다. URL에서 ':'까지는 접근하기 위한 방법을 나타내고, 콜론(:) 이후에 데이터의 위치나 서비스를 제공하는 서버의 주소를 나타낸다. 그리고, 나머지 부분은 접속될 포트 번호에 접근할 파일명을 나타낸다.For example, when expressed as 'http://www.trumpet.com.ar/', 'http' means protocol and 'www.trumpet.com.ar' means where you want to connect. The ':' in the URL indicates the method to access, and after the colon (:) the location of the data or the address of the server providing the service. And the remaining part shows the file name to access the port number to be connected.

URL 표준화부(200)는 판별 대상 DB(210), 동일 URL 판별부(220), 중복 도메인 DB(230), URL 표준화부(240), 표준 URL DB(250)를 포함하여, 인터넷 북마크 DB(100)에 저장된 URL들을 정규화하여 이를 표준 URL DB(250)에 저장한다.The URL standardization unit 200 includes a determination target DB 210, an identical URL determination unit 220, a duplicate domain DB 230, a URL normalization unit 240, and a standard URL DB 250. URLs stored in 100 are normalized and stored in the standard URL DB 250.

보다 상세히는, 동일 URL 판별부(220)는 북마크 DB(100)내의 각 URL에 들어있는 도메인 네임에서 동일 대상인지 판별해야 할 필요가 있는 도메인 네임을 추출하고, 이를 판별대상 DB(210)에 저장한 후 판별 대상 DB(210)로부터 판별 대상 도메인을 읽어내어 동일한 사이트를 나타내는 지를 인터넷을 통해 검색하여 중복 도메인 DB(230)에 저장된 정보를 업 데이트한다.In more detail, the same URL determination unit 220 extracts a domain name that needs to be determined from the domain name included in each URL in the bookmark DB 100 and the same object, and stores the same in the determination target DB 210. After that, by reading the determination target domain from the determination target DB 210 and searching through the Internet whether it represents the same site, the information stored in the duplicate domain DB 230 is updated.

URL 표준화부(240)는 북마크 DB(100)와 중복 도메인 DB(230)를 참조하여 여러 형태로 표현되고 있는 URL을 표준 포맷으로 전환하고, 표준 URL DB(250)는 URL 표준화부(240)에 의해 표준 포맷으로 전환된 URL 및 URL의 폴더 고유 번호를 저장한다.The URL normalization unit 240 converts the URLs expressed in various forms into a standard format with reference to the bookmark DB 100 and the duplicate domain DB 230, and the standard URL DB 250 is provided to the URL normalization unit 240. Stores the URL converted to the standard format and the folder unique number of the URL.

관련 URL 추출부(300)는 URL 관련 추출부(310), 관련 URL DB(320)를 포함하며, 표준 URL DB(250)에 저장된 표준 URL들로부터 URL 및 폴더 정보를 읽어 각 URL에 대해 관련성이 높은 URL들을 찾아 관련성이 높은 상위 N개에 대해 <URL, 관련 정보> 정보를 관련 URL DB(320)에 저장한다.The related URL extractor 300 includes a URL related extractor 310 and an associated URL DB 320. The URL extractor 300 reads URL and folder information from standard URLs stored in the standard URL DB 250. Search for high URLs and store <URL, related information> information in the relevant URL DB 320 for the top N highly relevant.

URL 표제어 구축부(400)는 URL 표제어 추출부(410), URL 표제어 DB(420)를 포함하여, 서버 에이전트를 통하여 인터넷 상의 각 URL들을 방문하여 URL 표제어를 추출하여 URL 표제어 DB(420)에 저장한다.The URL heading construction unit 400 includes a URL heading extraction unit 410 and a URL heading DB 420 to extract URL headings by visiting each URL on the Internet through a server agent and to store the URL headings in the URL heading DB 420. do.

보다 상세히는, URL 표제어 추출부(410)는 관련 URL DB(320)내의 URL들에 대해 URL 표제가 작성되어 있지 않거나, 또는 URL 표제 작성 시점이 일정 기간이 경과되어 다시 확인해야 하는 경우 인터넷상에서 해당 URL을 읽어 표제를 URL 표제어 DB(420)에 저장한다.In more detail, the URL heading extractor 410 may be configured on the Internet when URL headings are not written for URLs in the relevant URL DB 320, or when a URL heading point needs to be checked again after a certain period of time. The URL is read and the heading is stored in the URL heading DB 420.

서비스 제공부(500)는 관련 URL DB(320)와 URL 표제어 DB(420)가 구축되면 웹 서버(510)를 통해 상기 관련 URL정보와 URL 표제어 정보를 근거로 사용자가 현재 방문하고 있는 인터넷 웹 사이트와 관련성이 높은 하나 이상의 사이트를 사용자에게 제공한다. 물론 사용자가 방문하고 있는 웹 사이트와 관련성이 있는 사이트가 검색되지 않는 경우에는 서비스를 제공하지 않을 수도 있다.If the relevant URL DB 320 and the URL heading DB 420 are constructed, the service provider 500 may, via the web server 510, based on the related URL information and the URL heading information, and the Internet website that the user is currently visiting. Provide users with one or more sites that are highly relevant to Of course, the service may not be provided when a site that is related to the website that the user is visiting is not searched.

사용자의 PC에서 작동하고 있는 클라이언트부(600)는 사용자가 현재 방문하고 있는 인터넷 웹 사이트 URL 정보를 웹 서버(510)에 문의하고, 웹 서버(510)는 해당 URL에 해당하는 관련 URL 및 관련 정도, 관련 URL DB(320) 및 URL 표제어 DB(420)를 조회하여 검색된 관련 URL, 표제어, 관련정도 데이터 등을 클라이언트부(600)에 전송하여 클라이언트부(600)가 실시간으로 해당 정보를 사용자에게 표시할 수 있게 한다.The client unit 600 operating in the user's PC inquires the web server 510 of the Internet website URL information that the user is currently visiting, and the web server 510 has a related URL corresponding to the URL and the degree of the relatedness. , The relevant URL DB 320 and the URL heading DB 420 are searched to transmit the retrieved related URL, heading, and relevance data to the client unit 600 so that the client unit 600 displays the information to the user in real time. To do it.

이상에서 설명한 전체 동작 과정에서 구성 모듈들에 의해 자동적으로 정보가 수정되는 부분중 판별 대상 DB(210), 중복 도메인 DB(230), URL 표제어 DB(420)는 자동화에 의한 정보가 불충분할 수 있을 가능성이 있으므로 관리자에 의해 수동으로 해당 DB를 수정할 수 있게 한다.Among the parts of which information is automatically modified by the configuration modules in the entire operation process described above, the determination target DB 210, the duplicate domain DB 230, and the URL heading DB 420 may have insufficient information by automation. As there is possibility, it is possible to modify the DB manually by administrator.

도 1과 도 2를 참조하면, 먼저 인터넷 북마크 DB(100)로부터 각 웹 사이트의 URL을 정규화하여 하나 이상의 표준 URL을 생성하여 표준 URL DB(250)에 저장한다(단계 S100).Referring to FIGS. 1 and 2, first, one or more standard URLs are generated by normalizing the URL of each web site from the Internet bookmark DB 100 and stored in the standard URL DB 250 (step S100).

이어 표준 URL DB(250)에 저장된 표준 URL 정보로부터 각 웹 사이트간의 관련성을 추출하여 하나 이상의 관련 URL을 생성하여 연관 URL DB(320)에 저장한다(단계 S200).Subsequently, the relations between the web sites are extracted from the standard URL information stored in the standard URL DB 250, and one or more related URLs are generated and stored in the related URL DB 320 (step S200).

이어 서버 에이전트를 통하여 각 URL을 방문하여 하나 이상의 관련 URL 표제어를 생성하여 URL 표제어 DB(420)에 저장한다(단계 S300).Subsequently, each URL is visited through the server agent to generate one or more related URL headings and stored in the URL heading DB 420 (step S300).

이어 사용자가 현재 방문하고 있는 인터넷 웹 사이트와 관련성이 가장 높은 사이트들을 단계 S200에서 생성한 하나 이상의 관련 URL과 단계 S300에서 생성한 하나 이상의 관련 URL 표제어로부터 검색하여 상기 사용자에게 실시간으로 제공한다(단계 S400).Subsequently, sites most relevant to the Internet web site the user is currently visiting are searched for from one or more related URLs generated in step S200 and one or more related URL headings generated in step S300 and provided to the user in real time (step S400). ).

도 3에 도시한 바와 같이, URL 표준화 과정은 먼저 URL 서비스가 HTTP 또는 HTTPS 인지의 여부를 체크하여(단계 S210), URL 서비스가 HTTP 또는 HTTPS가 아닌 경우에는 표준화 대상에서 제외한 후(단계 S220) 실행을 종료한다. 물론 응용에 따라 다른 서비스, 예를 들어 FTP를 더 포함시킬 수도 있다.As shown in Fig. 3, the URL standardization process first checks whether the URL service is HTTP or HTTPS (step S210), and if the URL service is not HTTP or HTTPS, excludes it from standardization (step S220). To exit. Of course, depending on the application, you can also include other services, for example, FTP.

상기 단계 S210에서 URL 서비스가 HTTP 또는 HTTPS인 경우에는 서비스명을 제거한다(단계 S230). 예를 들어, 'http://www.microsoft.com' 과 같이 표시되어 있는 경우, 'http://' 부분을 제거한다.If the URL service is HTTP or HTTPS in step S210, the service name is removed (step S230). For example, if it is displayed as 'http://www.microsoft.com', remove the 'http: //' part.

이어 중복 도메인 DB를 참조하여 URL의 도메인 네임에 해당하는 대표 도메인이 있는지의 여부를 체크하여(단계 S240), 중복 도메인 DB에 대표 도메인이 있는 경우에는 URL내의 도메인 네임을 대표 도메인으로 대치한다(단계 S245).Next, it is checked whether there is a representative domain corresponding to the domain name of the URL with reference to the duplicate domain DB (step S240), and if there is a representative domain in the duplicate domain DB, the domain name in the URL is replaced with the representative domain (step S240). S245).

상기 단계 S240에서 중복 도메인 DB에 대표 도메인이 없는 경우 또는 단계S245 이후에 URL의 최종 파일명이 기본 웹 파일인지의 여부를 체크한다(단계 S250).If there is no representative domain in the duplicate domain DB in step S240 or after step S245, it is checked whether the final file name of the URL is a basic web file (step S250).

단계 S250에서 URL의 최종 파일명이 기본 웹 파일인 경우에는 URL내의 최종 파일명을 제거한다(단계 S255). 예를 들어 기본 웹 파일이 'default.asp' 또는 'index.html' 등인 경우에는 이를 제거한다.If the final file name of the URL is the basic web file in step S250, the final file name in the URL is removed (step S255). For example, if the default web file is 'default.asp' or 'index.html', remove it.

단계 S250에서 URL의 최종 파일명이 기본 웹 파일이 아닌 경우 또는 단계 S255 이후에 디렉토리 표시 부분을 제거하여 표준 URL을 생성한다(단계 S260). 즉, 필요한 경우 URL의 호스트명으로 축약한다거나 최종 패스 단계의 상위 디렉토리만으로 제한하는 등 여러 가지 방법을 사용할 수 있다.If the final file name of the URL is not the basic web file in step S250 or after step S255, the directory display portion is removed to generate a standard URL (step S260). That is, if necessary, various methods can be used, such as shortening to a host name of a URL or restricting to only the upper directory of the final pass step.

그 예로써 'www.microsoft.com/'에서 슬래쉬(/)를 제거하여 상위 디렉토리 단계인 'www.microsoft.com'로 축약할 수 있다.For example, you can abbreviate the parent directory level 'www.microsoft.com' by removing the slash (/) from 'www.microsoft.com/'.

이상의 과정을 거친 표준화된 URL을 이용하여 웹 서비스에서 실시간으로 관련 URL을 검색하거나 표준 URL DB(250)에 저장하는데 사용할 수 있다.By using the standardized URL, the above process can be used to retrieve the relevant URL in real time from the web service or to store it in the standard URL DB 250.

도 2에서 설명한 관련 URL 생성 단계는 구축된 표준 URL DB(250)를 이용하여 각 URL들에 대해 관련성이 높은 URL들을 구하여 DB화하는 것으로 상세한 과정은 도 4에 설명되어 있는 알고리즘을 따른다.The related URL generation step described with reference to FIG. 2 is performed by obtaining the relevant URLs for each URL by using the constructed standard URL DB 250. The detailed process follows the algorithm described in FIG. 4.

도 1과 도 4를 참조하면, 표준 URL DB(250)에서 상이한 URL들은 관련 URL DB 구축의 대상이므로 관련 URL DB내의 상이한 URL(본 발명의 설명의 편의를 위해제1URL군(u1)이라 하자)들에 대해 각각 다음의 과정을 진행한다.1 and 4, since different URLs in the standard URL DB 250 are related to the construction of the related URL DB, different URLs in the related URL DB (let the first URL group u1 for convenience of description of the present invention). For each of these, proceed as follows.

먼저 표준 URL DB(250)에서 제1 URL군(u1)이 속해 있는 모든 폴더들을 구하고, 이를 F1이라 한다(단계 S310). 이는 어떤 사용자는 제1 URL군(u1)을 폴더 A로 분류할 수 있지만, 다른 사용자는 동일 URL을 폴더 B로 분류할 수 있기 때문이다. 즉, 이렇게 구한 F1의 모든 폴더들은 제1 URL군(u1)을 포함하고 있다.First, all folders to which the first URL group u1 belongs are obtained from the standard URL DB 250, and this is called F1 (step S310). This is because some users can classify the first URL group u1 as folder A, while others can classify the same URL as folder B. That is, all the folders of F1 thus obtained include the first URL group u1.

이어 단계 S310에서 구한 F1의 각 폴더에 대해 이 폴더에 속하는 모든 URL들을 찾아 제2 URL군(u2)에 넣는다(단계 S320). 즉, 제2 URL군(u2)에 속한 모든 URL들은 하나 이상의 사용자에 의해 제1 URL군(u1)과 같은 폴더로 분류되고 있다는 것이다.Subsequently, for each folder of F1 obtained in step S310, all URLs belonging to this folder are found and put into the second URL group u2 (step S320). That is, all URLs belonging to the second URL group u2 are classified into the same folder as the first URL group u1 by one or more users.

이어 단계 S320에서 구한 제2 URL군(u2)에서 상이한 URL들을 찾아 제3 URL군(u3)에 넣는다(단계 S330).Subsequently, different URLs are found from the second URL group u2 obtained in step S320 and put into the third URL group u3 (step S330).

이어 제3 URL군(u3)에 속하는 모든 URL에 대해 제2 URL군(u2)에 나타나는 빈도수를 구하고(단계 S340), 상기 빈도수를 정렬하여 상위 N개의 URL과 해당 빈도수수를 관련 URL DB(320)에 저장한다(단계 S350).Next, a frequency appearing in the second URL group u2 is obtained for all the URLs belonging to the third URL group u3 (step S340), and the frequency is sorted so that the top N URLs and the corresponding frequencies are related to the URL DB 320. (Step S350).

이상에서 설명한 바와 같이, 관련 URL 추출의 기본 가정은 북마크 DB에 들어있는 URL과 폴더의 데이터는 사용자가 동일 카테고리의 URL들을 동일 폴더에 분류해 놓는다는 것이다. 일반적으로 사용자는 자신의 북마크를 구축할 때 동일 카테고리의 URL들은 동일 폴더에 분류한다고 볼 수 있기 때문에 이러한 가정은 유효하다할 수 있다.As described above, the basic assumption of extracting related URLs is that the URL and folder data contained in the bookmark DB classify URLs of the same category into the same folder. In general, this assumption may be valid because the user can see that URLs of the same category are classified in the same folder when building their bookmarks.

그렇기 때문에 DB 전체적으로 특정의 제1 URL과 또 다른 특정의 제2 URL이동일 폴더에 나타나는 빈도수가 높다면 제1 URL과 제2 URL은 상관 관계가 높다고 볼 수 있으며, 제1 URL에 대해 일반적으로 사용자에게 제2 URL은 의미있는 사이트이다.Therefore, if the frequency of appearing in a specific first URL and another specific second URL moving folder is high throughout the DB, the first URL and the second URL can be regarded as highly correlated. The second URL is a meaningful site.

도 2의 단계 S300에서 설명한 URL 표제어 DB 구축은 사용자에게 URL을 제시할 때 URL 자체와 함께 URL이 무엇을 의미하는지를 설명하기 위해 URL의 표제어를 찾아내고 DB에 저장하는 것이다. 즉, 관련 URL DB에 저장되어 있는 URL들에 대해 각 URL의 표제어를 찾아내는 과정으로 도 5와 같은 단계로 진행된다.The URL heading DB construction described in step S300 of FIG. 2 is to find the heading of the URL and store it in the DB to explain what the URL means with the URL itself when presenting the URL to the user. That is, the process of finding the headword of each URL with respect to the URLs stored in the relevant URL DB proceeds to the step as shown in FIG. 5.

도 1과 도 5를 참조하면, 먼저 대상이 되는 URL의 표제어가 URL 표제어 DB에 들어있는지 검사한다(단계 S410).1 and 5, it is first checked whether the heading of the target URL is contained in the URL heading DB (step S410).

표제어가 기존 DB에 들어 있는 경우에는 최종 확인 일자가 정의된 일자가 경과했는지의 여부를 체크하여(단계 S420), 경과한 경우에는 URL 표제어 DB 구축을 종료한다.If the headword is contained in the existing DB, it is checked whether or not the defined date of the final confirmation date has passed (step S420), and if it has passed, the URL heading DB construction is finished.

단계 S420에서 최종 확인 일자가 정의된 일자가 경과하지 않은 경우에는, URL 표제어 DB에 URL의 표제어가 들어있지 않은지, 또는 URL 표제어의 최종 수정일이 정의된 일자를 경과하지 않았고 URL 표제어가 자동으로 업 데이트되었는지의 여부를 체크한다(단계 S430).If the date in which the final confirmation date is defined in step S420 has not elapsed, the URL heading DB does not contain the heading of the URL, or the last heading date of the URL heading has not passed the defined date and the URL heading is automatically updated. Is checked (step S430).

단계 S430에서 URL 표제어 DB에 URL의 표제어가 들어있는 경우에는 단계 S410으로 피드백하고, 또한 URL 표제어의 최종 수정일이 정의된 일자를 경과하였거나 또는 URL 표제어가 자동으로 업 데이트되지 않은 경우에도 단계 S410으로 피드백한다In step S430, if the URL heading DB contains the heading of the URL, the process feeds back to step S410, and also feeds back to step S410 even if the last modification date of the URL heading has passed the defined date or the URL heading is not automatically updated. do

또한 단계 S430에서 URL 표제어 DB에 URL의 표제어가 들어있지 않은 경우, 또는 URL 표제어의 최종 수정 일자가 정의된 일자가 경과하지 않은 경우 및 URL 표제어가 자동으로 생성된 경우에는 최종 확인 일자가 정의된 일자를 초과한 URL에 대해서 해당 웹 서버를 방문하여 HTML을 읽고(단계 S440), URL 데이터를 수신하여 표제어를 추출한다(단계 S450).In addition, when the URL heading DB does not contain the heading of the URL in step S430, or when the date of the final modification of the URL heading is not defined, and the URL heading is automatically generated, the date of the final confirmation date is defined. With respect to the URL exceeding the visit to the web server to read the HTML (step S440), and receives the URL data to extract the headword (step S450).

단계 S450에서 추출된 표제어를 URL 표제어 DB에 업그레이드한다(단계 S460).The headword extracted in step S450 is upgraded to the URL heading DB (step S460).

URL 표준화 과정에서 동일한 사이트를 나타내는 서로 다른 도메인 네임을 가진 URL을 하나로 표준화하면 관련성 계산의 정확성을 향상시킬 수 있다. 이를 위한 처리가 동일 URL 판별 과정으로써 서로 다른 도메인 네임이지만 같은 사이트를 나타내는 경우 <도메인 네임, 대표 도메인 네임>으로 중복 도메인 DB를 구축한다. 그 과정은 도 6에 나타낸 바와 같다.During URL normalization, standardizing URLs with different domain names representing the same site into one can improve the accuracy of relevance calculations. If the processing for this is the same URL discrimination process and the domain name is different but represents the same site, a duplicate domain DB is constructed with <domain name, representative domain name>. The process is as shown in FIG.

도 1과 도 6을 참조하면, 먼저 판별 대상 도메인을 수동 또는 자동으로 선택한다(단계 S510). 이때 자동으로 대상 도메인을 선택하는 경우는 도메인 네임에서 특정 단어가 동일한 경우 대상 도메인으로 선정한다. 예를 들어, 'kr.yahoo.com'과 'yahoo.co.kr'은 구분어인 '.com'과 '.co.kr' 앞의 단어인 'yahoo'가 공통이므로판별 대상에 포함한다. 또한 수동으로 대상 도메인을 선택하는 경우는 필요에 따라 관리자가 판별 대상 DB에 직접 입력한 경우이다. 이때 시간에 따라 뉴스 사이트 등과 같이 시간에 따라 내용이 바뀌는 사이트도 있으므로 판별 대상 사이트인 d1, d2를 처리한 후에는 d1과 같은 판별 대상이 되는 다른 사이트인 d3이 판별 대상 DB에 있는 경우에는 우선 처리한다.1 and 6, first, a determination target domain is manually or automatically selected (step S510). At this time, if the target domain is automatically selected, if a specific word is the same in the domain name, the target domain is selected. For example, 'kr.yahoo.com' and 'yahoo.co.kr' are included in the object of discrimination because '.com', the word before '.com', and 'yahoo', before the word '.co.kr' are common. In addition, when the target domain is manually selected, the administrator directly inputs the target DB as necessary. At this time, the contents change over time, such as news sites, depending on the time, so after processing the target sites d1 and d2, if the other target site d3, such as d1, exists in the target DB, do.

이어 데이터의 끝인지의 여부를 체크하여(단계 S520), 데이터의 끝인 경우에는 실행을 완료하고, 데이터의 끝이 아닌 경우에는 사이트 d1, d2로부터 HTML 컨텐트를 읽는다(단계 S530).Then, it is checked whether or not it is the end of the data (step S520). If it is the end of the data, execution is completed, and if it is not the end of the data, HTML content is read from sites d1 and d2 (step S530).

이어 대상 도메인 d1, d2의 동일 여부를 체크하여(단계 S540), 단계 S540에서 대상 도메인 d1, d2가 상이한 경우에는 단계 S410으로 피드백하고, 대상 도메인 d1, d2가 동일한 경우에는 중복 도메인을 업 데이트한다(단계 S550). 단계 S540의 두 URL의 내용이 같은지를 비교하는 것은 두 파일 내용의 정확성이 정해진 정확도 이상이라면 동일한 것으로 처리한다.Then, whether the target domains d1 and d2 are the same (step S540), and if the target domains d1 and d2 are different in step S540, the process returns to step S410, and if the target domains d1 and d2 are the same, the duplicate domains are updated. (Step S550). Comparing whether the contents of the two URLs in step S540 are the same is treated as the same if the accuracy of the contents of the two files is greater than or equal to a predetermined accuracy.

이상에서는 사용자가 방문하고 있는 인터넷 웹 사이트와 일정 관련성을 추출하여 관련 사이트 정보를 제공하는 것을 설명하였으나, 해당 관련성이 검색되지 않는 경우에는 사용자에게 해당 서비스를 제공하지 않을 수도 있다.In the above description, it has been described to provide related site information by extracting a schedule relation with an Internet web site that the user visits. However, if the relevant relation is not searched, the service may not be provided to the user.

이상에서 언급한 바와 같이, URL 표준화 과정에서 동일한 사이트를 나타내는 서로 다른 도메인 네임을 가진 URL을 하나로 표준화하면 관련성 계산의 정확성을 향상시킬 수 있다.As mentioned above, in the URL standardization process, standardizing URLs having different domain names representing the same site into one can improve the accuracy of the relevance calculation.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술분야의 숙련된 당업자는 하기의 특허청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to a preferred embodiment of the present invention, those skilled in the art will be able to variously modify and change the present invention without departing from the spirit and scope of the invention as set forth in the claims below. It will be appreciated.

이상 설명한 바와 같이, 본 발명에 따르면 사용자에 의해 직접 분류된 인터넷 북 마크 DB 정보를 이용하여 인터넷 사이트들에 대한 관련성 순위를 작성하여 사용자들에게 현재 방문하고 있는 인터넷 사이트와 관련성이 높은 사이트들을 실시간으로 추천할 수 있다.As described above, according to the present invention, the relevance ranking for the Internet sites is prepared by using the Internet book mark DB information directly classified by the user, and the sites highly relevant to the Internet sites currently visited by the users are real-time. I can recommend it.

Claims

(a) normalizing the URL of each web site from an internet bookmark DB to generate one or more standard URLs;

(b) generating one or more related URLs by extracting associations between respective web sites from the standard URLs;

(c) visiting each URL through a server agent to generate one or more related URL headings; And

(d) search for the sites that are highly relevant to the Internet web site the user is currently visiting from the one or more related URLs generated in step (b) and the one or more related URL headings generated in step (c) Deliver in real time

Related website providing method comprising a.

The method of claim 1, wherein step (a)

(a-1) checking whether the URL service is HTTP / HTTPS and excluding it from the standardization target if the URL service is not HTTP / HTTPS;

(a-2) if the URL service is HTTP / HTTPS in step (a-1), checking whether a representative domain exists in the duplicate domain DB by removing the service name;

(a-3) if the representative domain exists in step (a-2), replacing the domain name in the URL with the representative domain;

(a-4) checking if the representative domain does not exist in step (a-2) and after step (a-3) whether the final file name of the URL is a basic web file;

(a-5) if the final file name of the URL is a basic web file in step (a-4), removing the final file name in the URL; And

(a-6) in the step (a-4), if the final file name of the URL is not a basic web file and after removing the last file name, removing the directory display to generate a standard URL. How to Provide Related Websites.

The method of claim 1, wherein step (b) comprises:

(b-1) extracting all folders to which the first URL which is a standard URL belong;

(b-2) extracting second URLs belonging to all folders obtained in the step (b-1);

(b-3) extracting a different third URL from the second URL obtained in step (b-2);

(b-4) extracting frequencies appearing in the second URL for all the URLs belonging to the third URL; And

and (b-5) storing the top N URLs and the frequencies of all URLs obtained in the step (b-4) by sorting by frequency.

The method of claim 1, wherein step (c) comprises:

(c-1) reading a first URL from the one or more related URLs and checking whether it is the end of data and ending if it is the end of data;

(c-2) checking whether the first URL is new when it is not the end of the data in step (c-1), and feeding back to the step (c-1) when it is non-new;

(c-3) reading HTML from an Internet web site if the first URL is new in step (c-2);

(c-4) extracting the headword from the HTML read in step (c-3); And

(c-5) updating the extracted headwords.

An Internet book DB that stores URLs of one or more web sites;

A URL normalization unit for constructing a standard URL DB by normalizing URLs stored in the Internet bookmark DB;

A related URL extracting unit for constructing a related URL DB by extracting a relation between respective web sites from the standard URL DB;

A URL heading construction unit for extracting headings of related URLs by visiting each URL through a server agent, and constructing a related URL heading DB; And

A service providing unit that provides the user with at least one site that is highly related to the Internet web site that the user is currently visiting based on the information stored in the related URL DB and the URL heading DB.

Related website providing system comprising a.

The method of claim 5, wherein the service provider,

Related website providing system, characterized in that to provide the at least one site in real time.

The method of claim 5, wherein the URL standardization unit,

Determination target DB;

Duplicate domain DBs that store representative domain names for these domains if they are different domain names but actually specify the same web site;

From the domain name contained in each URL in the bookmark DB, it is necessary to extract the domain name that needs to be identified, store it in the target DB, and read the target domain from the target DB to determine the same site. A same URL discrimination unit which searches whether the information is displayed through the Internet and updates the information stored in the duplicate domain DB;

A URL normalization unit extracting a URL having a standard format by referring to the bookmark DB and the duplicate domain DB; And

And a standard URL DB for storing a URL converted into a standard format by the URL standardization unit and a folder unique number of the URL.

The method of claim 5, wherein the related URL extraction unit,

A URL-related extracting unit that reads the URL and folder information generated by the URL standardizing unit and arranges them in a relevance ranking for each URL, and outputs <URL, degree of relevance> information for the top N arranged; And

And a related URL DB for storing information output from the URL related extracting unit.

The method of claim 5, wherein the URL header construction unit,

If the first URL headword is not created in the related URL DB built by the related URL extractor or if the first URL heading is to be reconfirmed after a certain period of time has elapsed, the second URL heading is read by reading the URL on the Internet. A URL header extractor; And

And a URL heading DB for storing the second URL heading.

The method of claim 5, wherein the service provider,

And a web server which receives a current URL from a user and provides URL information related thereto to the user.