KR20110008980A

KR20110008980A - Apparatus and method for integration search of web site without redundancy information

Info

Publication number: KR20110008980A
Application number: KR1020090066566A
Authority: KR
Inventors: 신한진
Original assignee: 신한진
Priority date: 2009-07-21
Filing date: 2009-07-21
Publication date: 2011-01-27

Abstract

PURPOSE: An integrated website search device and a method thereof for eliminating redundant information among the information searched in search engines are provided to supply various search results by supplying the information searched in a plurality of search engines. CONSTITUTION: A search word analysis unit(120) performs the morphological analysis of a search word. A keyword generator(130) generates the keyword for the search at search engines based on the morpheme analysis result and records the automatic executive instruction. A search word transmission unit(140) transmits a keyword in which the automatic executive instruction is recorded to search engines. A search information receiving part(150) receives information searched through keyword in the information providing server from search engines.

Description

Apparatus and method for integration search of web site without redundancy information}

본 발명은 웹사이트 검색 시스템에 관한 것으로, 특히 검색된 정보 중 중복되는 정보들을 배제하여 사용자에게 제공되는 검색결과를 단순화 및 다양화하여 간편하고 신속한 검색을 가능하게 하기 위한 웹사이트 통합 검색 장치 및 방법에 관한 것이다.The present invention relates to a website search system, and more particularly, to a website integrated search apparatus and method for simplifying and diversifying a search result provided to a user by excluding duplicate information among searched information to enable a simple and quick search. It is about.

인터넷은 우리가 상상하는 것 이상으로 빠르게 팽창해 가고 있는데, 이는 기업체뿐만 아니라 일반 개인들도 자신들의 홍보를 위하여 홈페이지를 만들고 있기 때문이다. 많은 사람들이 예술, 문학, 학술, 의학, 오락, 스포츠, 정치, 과학기술, 산업, 경제 등 많은 분야에 걸쳐 자신의 관심사를 홈페이지에 담아내고 있으며, 전 세계적으로 쏟아지는 매일 매일의 뉴스 기사가 각 분야별로 관심있는 사람들이 형성한 뉴스 그룹에서 시시콜콜한 내용부터 전문적인 내용까지 다루어지고 있는 실정이다.The Internet is expanding faster than we can imagine, because not only companies but also individuals are creating homepages for their publicity. Many people share their interests on their homepages in many fields such as arts, literature, academics, medicine, entertainment, sports, politics, technology, industry, and economics, and there are daily news stories from around the world. The news group formed by interested people is dealing with everything from squishy to professional content.

이렇게 인터넷에 올라오는 내용이 다양해지면서 인터넷으로 제공되는 정보는 눈더미처럼 불어가고 있다. 따라서 빠르게 성장해 나가고 있는 인터넷 상의 많은 정보들 중 사용자가 원하는 정보만을 선별하여 골라내기 위해서는 나름대로의 노우-하우(know-how)가 필요한데, 이를 가능하게 해주는 것이 바로 검색엔진(야후, 알리안츠, 네이버, 엠파스, 파란, 네이트, 다음, 구글 등)이다.As the content on the Internet is diversified, the information provided on the Internet is blowing like a snowball. Therefore, in order to select and select only the information desired by the user from among a lot of information on the internet that is growing rapidly, the search engine (Yahoo, Allianz, Naver, Empas) enables this. , Blue, nate, next, google, etc.).

검색엔진은 인터넷상에 있는 수많은 사이트들 중 자신이 찾고자 하는 정보를 포함하고 있는 사이트를 검색할 수 있도록 도와주는 기능을 말하며, 인터넷상에는 수백 종의 검색엔진이 존재하고 있다. 검색엔진을 분류하는 기준에 대하여 공식적으로 정립된 바는 없지만, 일반적으로 검색엔진의 동작형태에 따라 주제별 검색엔진과 키워드형 검색엔진으로 구별되며, 키워드형 검색엔진은 다시 일반키워드형 검색엔진, 프론트-앤드(Front-End)형 검색엔진, 지능형 검색엔진으로 구분된다.Search engines are functions that help you search the site that contains the information you are looking for among numerous sites on the Internet, and there are hundreds of search engines on the Internet. The criteria for classifying search engines have not been officially established, but in general, the search engines are divided into thematic search engines and keyword-type search engines according to the operation form of the search engines. -It is classified into front-end type search engine and intelligent search engine.

그러나 모든 검색엔진들이 상기에서 언급한 것처럼 분류하는 기준이 뚜렷이 구분되는 것이 아니며, 주제별 검색엔진의 대명사격인 "야후(YAHOO;상표명)"도 키워드 검색을 지원하고 있으며, 키워드형 검색엔진들 대부분도 주제별 검색 서비스를 동시에 제공하고 있다.However, the criteria for classifying all search engines as mentioned above are not clearly distinguishable, and "YAHOO (trade name)", which is a prominent shot of thematic search engines, also supports keyword search, and most keyword-type search engines also support Search service is provided at the same time.

주제별 검색엔진이라 함은 인터넷에 있는 정보를 사회, 문화, 예술, 스포츠, 정치 등 큰 주제에 따라 분류해 놓은 목록을 제공하는 검색엔진을 말하는 것으로, 해당 주제에 해당하는 각종 정보를 목록으로 제공하기 때문에 디렉토리 서버, 주제별 카탈로그, 메뉴검색 등으로도 불려지며, 정보를 찾기 위한 특별한 주제어나 중심어를 뽑아낼 수 없는 상황일 때 사용하면 쉽게 해당정보에 접근할 수 있다는 장점이 있다. 그러나 원하는 정보에 접근하기까지 "대분류 → 중분류 → 소분류 → 찾는 정보"와 같이 여러 단계를 거쳐야하므로 중간에 길을 잘못 설정하면 자신이 찾는 내용과 더욱 더 멀어질 가능성이 산재하고 있는 단점이 있다.Search engines by topic are search engines that provide a list of classified information on the Internet according to large topics such as society, culture, arts, sports, and politics. Therefore, it is also called as a directory server, a subject catalog, menu search, etc., and it has an advantage that it can be easily accessed when it is not possible to extract a special subject or a central term for finding information. However, there are many disadvantages, such as "large classification → middle classification → small classification → looking for information", to access the information you want to go through.

키워드형 검색엔진은 인터넷에 있는 홈페이지의 내용과 URL(홈페이지 주소)을 자체 데이터베이스로 구축해 둔 것을 말하는 것으로, 단 몇 개의 키워드(검색어)를 입력하여 원하는 정보를 신속하게 찾을 수 있다는 장점을 가지고 있다. 그러나 정확한 키워드를 뽑아낼 수 없는 상황에서 키워드 검색을 실시할 경우, 엉뚱한 결과의 출력으로 많은 시간을 낭비하는 결과를 초래할 수 있는 단점이 있다.Keyword-type search engine refers to the construction of homepage contents and URLs (homepage addresses) on the Internet in its own database, and has the advantage of quickly searching for desired information by entering only a few keywords (search terms). However, if a keyword search is performed in a situation where an exact keyword cannot be extracted, there is a disadvantage in that a lot of time can be wasted due to the output of wrong results.

이처럼 각 검색엔진마다 그 자료수집방식이 다르고, 데이터베이스에 등록된 웹사이트도 시간의 경과에 따라 인터넷 주소값(URL;uniform resource locator), 도메인네임(domain name) 또는 통신프로토콜주소(IP address)가 변경되거나 소멸된 경우가 많아, 통신 이용자가 하나의 검색엔진을 통해 원하는 모든 정보를 찾기는 매우 어렵다. 이에 사용자들은 보통 복수 개의 검색엔진을 이용해 정보를 검색하게 되는데, 이때 사용자는 다른 검색엔진을 방문하여 동일한 검색어를 재입력해야 한다. 따라서 각각의 검색엔진을 운영하는 포털 사이트들을 각각 찾아다니며 동일한 검색어를 일일이 재입력하기 위한 많은 시간과 노력이 요구되는 불편함이 있다.As such, the data collection method is different for each search engine, and the website registered in the database also has a uniform resource locator (URL), domain name, or communication protocol address (IP address) over time. In many cases, it has been changed or destroyed, so it is very difficult for a communication user to find all the desired information through a single search engine. Therefore, users usually search for information using a plurality of search engines, and the user must visit another search engine and re-enter the same search word. Therefore, it is inconvenient that a lot of time and effort is required to search for portal sites that operate each search engine and re-enter the same search word.

또한 종래의 검색엔진에서 사용자에게 제공하고 있는 검색방법은 도 1과 같이, 야휴, 엠파스, 구글 등과 같은 검색엔진의 포털 사이트의 검색 시스템(200)이 각기 다른 여러 사이트들의 검색 시스템(300)에 존재하는 데이터베이스에 접속하여 원하는 자료를 검색한 후 그 결과를 상기 포털 사이트 상에 출력하여 사용자들이 그 내용을 확인해 볼 수 있도록 하고 있다. 이처럼 종래의 검색엔진에서는 검색 결과 값을 표현하는 방식에 있어서 관련된 여러 데이터베이스의 검색 시스템(300)에 서 얻을 수 있는 결과 값을 그대로 전달받아 편집을 통해 모든 데이터 값을 함께 표현해주고 있기 때문에, 검색결과가 중복되어 제공되는 경우가 많다. 이에 따라 사용자들은 검색엔진에서 제공되는 검색결과가 무수히 많이 제공되고 있지만, 중복되는 자료들이 많은 부분을 차지하고 있어, 원하는 결과 값을 확인하기 위해 동일한 자료를 여러 번 확인해야 하는 번거로움이 발생되어 사용자가 목적하는 결과 값을 찾으려면 상당히 많은 시간과 인내를 필요로 한다.In addition, the search method provided to the user in the conventional search engine, as shown in Figure 1, the search system 200 of the portal site of the search engine, such as Yahoo, Empas, Google, etc. exist in the search system 300 of different sites After accessing the database to search for the desired data and output the results on the portal site so that users can check the contents. As such, in the conventional search engine, since the result values obtained from the search system 300 of various databases are directly received in the method of expressing the search result values, all the data values are expressed together by editing. Are often provided in duplicate. As a result, users are provided with numerous search results provided by the search engine, but the overlapping data occupies a large part, and the user has to check the same data several times to check the desired result value. It takes a lot of time and patience to find the desired result.

이처럼, 사용자는 종래의 통합 검색 방법으로 목적하는 정보를 검색하기 위해서 다수의 서로 다른 검색엔진을 각각 방문하여 동일한 검색어를 재입력해야하는 번거로움이 발생되며, 또한 각 검색엔진에서 다수의 데이터베이스를 검색하여 추출한 수많은 검색 정보들에 대하여 별도의 분류과정을 거치거나, 또는 중복된 정보들을 배제하는 과정을 거치지 않고 단순한 편집 과정만을 거쳐 사용자에게 제공함으로써, 동일 자료들이 중복해서 제공되는 문제점이 발생된다.As such, the user has to both visit a plurality of different search engines and re-enter the same search term in order to search for the desired information using a conventional integrated search method, and also search a plurality of databases in each search engine. By providing a user with a simple editing process without going through a separate classification process or a process of excluding duplicate information for the extracted numerous search information, the problem that the same data is provided repeatedly.

따라서 사용자는 검색엔진의 포털 사이트를 통해 자신이 원하는 자료를 검색하는데 많은 시간과 노력이 요구되게 되어, 보다 간편하고 신속한 검색을 요하는 사용자들의 욕구를 충분히 충족시키지 못하고 있다.Therefore, the user needs a lot of time and effort to search for the desired material through the portal site of the search engine, and does not sufficiently satisfy the needs of users who need a simpler and faster search.

본 발명은 상기와 같은 문제점을 해결하기 위해 안출한 것으로서, 사용자가 입력한 검색어에 해당하는 정보를 검색하고자 하는 대상 검색엔진에 일일이 입력하지 않고도 미리 등록한 복수개의 검색엔진에서 각각 검색된 정보를 통합하고, 통합 된 검색 정보 중 중복된 정보들을 배제하여 사용자에게 제공하는 중복 정보가 제거된 웹사이트 통합 검색 장치 및 방법을 제공하는데 그 목적이 있다.In order to solve the above problems, the present invention integrates the information retrieved from each of a plurality of pre-registered search engines without having to enter the target search engine to search for information corresponding to the search word entered by the user, An object of the present invention is to provide an integrated website search method and method for removing duplicate information provided to a user by excluding duplicate information among integrated search information.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 중복 정보가 제거된 웹사이트 통합 검색 장치의 특징은 선택된 적어도 하나 이상의 검색엔진을 통해 정보를 검색하기 위한 검색어를 입력하는 검색 입력부와, 상기 입력된 검색어의 형태소 분석을 수행하는 검색어 분석부와, 상기 형태소 분석으로 재구성되어 검색엔진에서의 검색을 위한 키워드를 생성하고, 상기 생성된 키워드에 자동 실행 명령어를 기록하는 키워드 생성부와, 상기 자동 실행 명령어가 함께 기록된 키워드를 입력받아 상기 선택된 적어도 하나 이상의 검색엔진에 송신하는 검색어 송신부와, 상기 적어도 하나 이상의 검색엔진에서 각각 입력된 키워드를 통해 정보 제공서버에서 검색된 정보를 통합하여 수신하는 통합검색 정보 수신부와, 상기 통합된 검색 정보에서 중복 정보를 제거하는 중복정보 제거부와, 상기 중복 정보가 배제된 검색 정보를 디스플레이하는 검색 정보 출력부를 포함하는데 있다.In order to achieve the above object, a feature of an integrated website search apparatus for removing duplicated information according to the present invention is a search input unit for inputting a search word for searching for information through at least one selected search engine, and the input search word. A search term analysis unit for performing a morphological analysis of the keyword, a keyword generation unit reconstructed by the morphological analysis to generate a keyword for searching in a search engine, and recording an automatic execution command in the generated keyword, and the automatic execution command A search word transmission unit which receives the recorded keywords and transmits them to the selected at least one search engine, and an integrated search information receiver which integrates and receives the information searched by the information providing server through the keywords input from the at least one search engine; Remove duplicate information from the integrated search information. And removing duplicate information, is to include a search information output for displaying the duplicated information is excluded search information.

바람직하게 상기 중복정보 제거부는 통합된 검색 정보에서 인터넷 주소값(URL;uniform resource locator), 도메인네임(domain name) 또는 통신프로토콜주소(IP address)를 비교 분석하여 동일 여부에 따라 동일한 중복 정보들을 제거하는 것을 특징으로 한다.Preferably, the duplicate information removing unit compares and analyzes an Internet resource value (URL), a domain name, or a communication protocol address (IP address) in the integrated search information to remove the same duplicate information according to whether the same. Characterized in that.

바람직하게 상기 자동 실행 명령어는 상기 검색엔진에서 상기 키워드에 따른 정보 제공 서버의 검색이 자동 실행되도록 설정하는 것을 특징으로 한다.Preferably, the automatic execution command may be configured to automatically execute a search of the information providing server according to the keyword in the search engine.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 중복 정보가 제거된 웹사이트 통합 검색 방법의 특징은 (A) 검색어 입력창에 검색어를 입력하고, 설정된 복수의 검색엔진 중 적어도 하나 이상을 선택하는 단계와, (B) 상기 입력된 검색어를 형태소 분석하는 단계와, (C) 상기 형태소 분석된 검색어를 기반으로 선택된 검색엔진에서의 검색을 위한 키워드를 생성하고, 상기 생성된 키워드와 함께 상기 검색엔진에서 상기 키워드의 입력과 동시에 검색이 자동으로 실행시키기 위한 자동 실행 명령어를 함께 기록하는 단계와, (D) 상기 키워드를 기반하여 선택된 적어도 하나 이상의 검색엔진을 통해 검출된 검색정보를 통합하여 수신하는 단계와, (E) 상기 통합된 검색 정보에서 중복 정보를 제거하는 단계와, (F) 중복 정보가 배제된 검색 정보를 화면에 디스플레이하는 단계를 포함하는데 있다.In order to achieve the above object, a feature of the integrated website search method for removing duplicated information according to the present invention includes (A) inputting a search word in a search word input window and selecting at least one of a plurality of set search engines; And (B) morphologically analyzing the input search word, and (C) generating a keyword for searching in a selected search engine based on the morphologically analyzed search word, and together with the generated keyword in the search engine. Recording an automatic execution command for automatically executing a search simultaneously with the input of the keyword, (D) integrating and receiving search information detected through at least one or more search engines selected based on the keyword; (E) removing duplicate information from the integrated search information, and (F) displaying search information without duplicate information on the screen. It may comprises a step of rays.

바람직하게 상기 (A) 단계는 대분류된 국내포탈 검색엔진 및 해외포탈 검색엔진 중 어느 하나를 선택하는 단계와, 상기 선택결과 국내포탈 검색엔진을 선택한 경우, 소분류되어 미리 설정된 국내 검색엔진들 중 적어도 하나 이상을 선택하는 단계와, 상기 선택결과 해외포탈 검색엔진을 선택한 경우, 소분류되어 미리 설정된 해외 검색엔진들 중 적어도 하나 이상을 선택하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (A) includes selecting one of the largely classified domestic portal search engines and the overseas portal search engines, and when selecting the domestic portal search engines as a result of the selection, at least one of the pre-set domestic search engines. And selecting at least one or more of the foreign search engines that are classified into a predetermined category when the overseas portal search engine is selected as a result of the selection.

바람직하게 상기 (E) 단계는 통합된 검색 정보에서 인터넷 주소값(URL;uniform resource locator), 도메인네임(domain name) 또는 통신프로토콜주소(IP address)를 비교 분석하는 단계와, 상기 분석 결과, 인터넷 주소값(URL;uniform resource locator), 도메인네임(domain name), 통신프로토콜주 소(IP address), 또는 주소값(URL) 중 적어도 하나가 동일한 중복 정보들을 하나만 남겨두고 모두 제거하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (E) comprises comparing and analyzing a uniform resource locator (URL), a domain name, or a communication protocol address (IP address) in the integrated search information, and as a result of the analysis, At least one of an address resource (URL; uniform resource locator), a domain name, a communication protocol address (IP address), or an address value (URL) includes removing all but the same duplicated information. It is characterized by.

바람직하게 상기 (E) 단계는 상기 인터넷 주소값(URL;uniform resource locator), 도메인네임(domain name) 또는 통신프로토콜주소(IP address) 중 적어도 하나가 동일한 중복 정보들을 하나만 남겨두고 모두 제거한 후, 검색된 하나 이상의 검색정보를 주제별 목록 또는 목록별 내용을 각각 텍스트(text)로 변환하여 각각 비교 분석하는 단계와, 상기 분석 결과, 텍스트 중심의 동일한 중복 정보들을 하나만 남겨두고 모두 제거하는 단계를 포함하는 것을 특징으로 한다.Preferably, in step (E), at least one of the uniform resource locator (URL), a domain name, or a communication protocol address (IP address) removes all but the same duplicated information. Converting one or more search information into a list of subjects or contents of each list into texts and comparing and analyzing them, respectively, and removing all of the same duplicate information centered on text as a result of the analysis. It is done.

이상에서 설명한 바와 같은 본 발명에 따른 중복 정보가 제거된 웹사이트 통합 검색 장치 및 방법은 다음과 같은 효과가 있다.As described above, the apparatus and method for integrated website search from which duplicate information is removed according to the present invention has the following effects.

첫째, 동일한 검색어를 통한 다수의 검색엔진에서의 검색인 경우 사용자가 검색하고자 하는 정보를 하나의 특정 검색엔진을 통해 검색하는 것이 아니라, 미리 등록한 복수개의 검색엔진에서 각각 검색된 정보를 통합하여 제공하므로, 보다 다양하고 많은 검색 결과를 제공할 수 있다.First, in the case of a search in a plurality of search engines using the same search term, instead of searching for a specific search engine through a single specific search engine, the searched information is integrated and provided by a plurality of pre-registered search engines. It can provide more variety and more search results.

둘째, 미리 등록한 복수개의 검색엔진에서 각각 검색되어 통합된 검색 결과 중에서 중복되는 정보들을 배제시킴으로써, 사용자들로 하여금 수많은 검색 결과 중에서 원하는 정보를 쉽게 찾을 수 있어 더 많은 결과를 빠르고 편리하게 검색할 수 있다.Second, by excluding duplicate information among search results that are searched and integrated in a plurality of pre-registered search engines, users can easily find the desired information among numerous search results, and thus can search more results quickly and conveniently. .

본 발명의 다른 목적, 특성 및 이점들은 첨부한 도면을 참조한 실시예들의 상세한 설명을 통해 명백해질 것이다.Other objects, features and advantages of the present invention will become apparent from the following detailed description of embodiments with reference to the accompanying drawings.

본 발명에 따른 중복 정보가 제거된 웹사이트 통합 검색 장치 및 방법의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 설명하면 다음과 같다. 그러나 본 발명은 이하에서 개시되는 실시예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예는 본 발명의 개시가 완전하도록하며 통상의 지식을 가진자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.Referring to the accompanying drawings, a preferred embodiment of the apparatus and method for integrated website search for removing duplicated information according to the present invention will be described below. However, the present invention is not limited to the embodiments disclosed below, but can be embodied in various forms, and only the present embodiments are intended to complete the disclosure of the present invention and to those skilled in the art to fully understand the scope of the invention. It is provided to inform you. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

도 2 는 본 발명의 실시예에 따른 중복 정보가 제거된 웹사이트 통합 검색 장치를 포함하는 전체 구조를 개략적으로 나타낸 구성도로서, 인터넷 망을 통해 서로 연결된 웹사이트 통합검색 장치(100), 검색엔진(200) 및 정보 제공서버(300)를 포함하여 구성된다. 2 is a block diagram schematically showing the overall structure including a website integrated search apparatus from which duplicate information is removed according to an embodiment of the present invention. The website integrated search apparatus 100 and a search engine connected to each other through an internet network are shown in FIG. 200 and the information providing server 300 is configured.

이때, 상기 검색엔진(200)은 네이버, 엠파스, 파란, 네이트, 다음, 구글, 야후, Ask 등 자신이 찾고자 하는 정보를 포함하고 있는 사이트를 검색할 수 있도록 도와주는 웹사이트로서, 국내포탈 검색엔진(네이버, 엠파스, 파란, 네이트, 다음, 구글 코리아, 야후 코리아 등)과 해외포탈 검색엔진(영어권:Google, Yahoo, Ask, Altavista, 중국어권:sohu.com, qq.com, sina.com, 일본어권:google.co.jp, goo.ne.jp, excite.co.jp 등) 으로 구분하여 구성된다.In this case, the search engine 200 is a website that helps search a site including information that the user wants to find, such as Naver, Empas, Blue, Nate, Daum, Google, Yahoo, Ask, and etc. (Naver, Empas, Blue, Nate, Daum, Google Korea, Yahoo Korea, etc.) and overseas portal search engines (English: Google, Yahoo, Ask, Altavista, Chinese: sohu.com, qq.com, sina.com, Japanese) Volume: google.co.jp, goo.ne.jp, excite.co.jp, etc.).

그리고 상기 정보 제공서버(300)는 웹사이트, 사전, 지식검색, 카페, 블로그, 이미지, 동영상, 음각, 뉴스, 전문자료, 웹페이지 등 상기 검색엔진(200)을 통해 검색되어 사용자가 찾고자 하는 정보를 저장하고 있는 서버이다.The information providing server 300 is searched through the search engine 200 such as a website, a dictionary, a knowledge search, a cafe, a blog, an image, a video, an engraving, a news, a specialized data, a web page, and the like, and a user searches for information. The server that stores the.

따라서 웹사이트 통합검색 장치(100)는 입력된 검색어를 단순한 편집 과정(띄어쓰기, 문장구분, 단어구분 등)을 통해 분석하여 키워드를 생성한 후, 미리 등록한 복수개의 검색엔진(200)에 생성된 키워드를 전송하여 사용자가 선택한 검색엔진(200)에서 해당되는 정보 제공서버(300)를 자동 검색되도록 한 후, 이를 통합하여 사용자에게 제공한다. 이때, 상기 웹사이트 통합검색 장치(100)는 통합된 검색 정보를 사용자에게 제공하기 전에 먼저 인터넷 주소값(URL;uniform resource locator), 도메인네임(domain name), 통신프로토콜주소(IP address)를 기반으로 중복된 정보들을 배제하여 사용자에게 제공한다. 이때, 주제별 목록 또는 목록 내용별로 각각 텍스트(text)로 변환하여 이를 각각 비교 분석하는 텍스트 중심의 동일한 중복 정보들을 추가로 배제할 수도 있다.Therefore, the integrated website search apparatus 100 generates a keyword by analyzing the input search word through a simple editing process (space writing, sentence division, word division, etc.), and then generates keywords in a plurality of pre-registered search engines 200. After transmitting the information providing server 300 is automatically searched in the search engine 200 selected by the user, it is integrated and provided to the user. In this case, before providing the integrated search information to the user, the website integrated search apparatus 100 first based on a uniform resource locator (URL), a domain name, and a communication protocol address (IP address). The duplicated information is excluded and provided to the user. In this case, the same text-centric duplicate information for converting each subject list or list content into text and comparing and analyzing them may be further excluded.

도 3 는 본 발명의 실시예에 따른 중복 정보가 제거된 웹사이트 통합 검색 장치를 상세히 나타낸 구성도이고, 도 4 는 본 발명의 실시예에 따른 웹사이트 통합 검색된 결과를 나타내는 출력 브라우저 화면을 캡처한 도면이다.FIG. 3 is a detailed block diagram illustrating a website integrated retrieval apparatus from which duplicate information is removed according to an embodiment of the present invention. FIG. 4 is a screen capture of an output browser screen showing a website integrated retrieval result according to an embodiment of the present invention. Drawing.

도 3 및 도 4와 같이, 웹사이트 통합 검색 장치(100)는 사용자가 선택한 검색엔진(200)을 통해 원하는 정보를 검색하기 위한 검색어를 입력할 수 있도록 입력창(600)으로 제공하는 검색어 입력부(110)와, 상기 검색어 입력부(110)를 통해 입 력된 검색어의 형태소 분석을 통해 문법적 기능을 알 수 있도록 검색어를 분석하는 검색어 분석부(120)를 포함한다. As shown in FIGS. 3 and 4, the integrated website search apparatus 100 may provide a search term input unit provided to the input window 600 so that a user may input a search term for searching for desired information through the selected search engine 200. 110, and a search word analyzer 120 analyzing the search word so as to know a grammatical function through morphological analysis of the search word input through the search word input unit 110.

이어, 상기 검색어 분석부(120)를 통해 형태소 분석으로 재구성되어 검색엔진(200)에서의 검색을 위한 키워드를 생성하고, 상기 생성된 키워드에 자동 실행 명령어를 기록하는 키워드 생성부(130)와, 상기 키워드 생성부(130)에서 자동 실행 명령어가 함께 기록된 키워드를 입력받아 사용자가 선택한 검색엔진(200)에 송신하는 검색어 송신부(140)를 포함한다. 이처럼, 상기 키워드 생성부(130)는 생성된 키워드와 함께 사용자에 의해 선택되는 검색엔진(200)에서 상기 키워드가 입력과 동시에 검색이 자동으로 실행시키기 위한 실행 명령어를 함께 기록한다. 이때, 상기 자동 실행 명령어는 상기 검색엔진에서 상기 키워드에 따른 정보 제공 서버(300)의 검색이 자동 실행되도록 설정하는 것을 말한다.Subsequently, the keyword generation unit 130 generates a keyword for searching in the search engine 200 through the search term analysis unit 120, and records an automatic execution command in the generated keyword. The keyword generation unit 130 includes a keyword transmission unit 140 that receives a keyword recorded with an auto-execution command and transmits the keyword to the search engine 200 selected by the user. As such, the keyword generation unit 130 records an execution command for automatically executing the search as soon as the keyword is input in the search engine 200 selected by the user together with the generated keyword. In this case, the automatic execution command refers to setting the search of the information providing server 300 according to the keyword to be automatically executed in the search engine.

그리고 다수의 검색엔진(200)에서 각각 입력된 키워드를 통해 정보 제공서버(300)에서 검색된 정보를 통합하여 수신하는 통합검색 정보 수신부(150)와, 상기 통합된 검색 정보에서 인터넷 주소값(URL;uniform resource locator), 도메인네임(domain name), 통신프로토콜주소(IP address)를 비교 분석하여 동일 여부에 따라 동일한 중복 정보들을 제거하는 중복정보 제거부(160)와, 상기 중복정보 제거부(160)에서 중복 정보가 배제된 검색 정보를 디스플레이하는 검색 정보 출력부(170)를 포함한다. 이때, 상기 검색 정보 출력부(170)에서 검색 정보가 디스플레이되는 방식은 주제별 또는 키워드별로 디스플레이 된다. 이때, 상기 중복정보 제거부(160)는 검색된 하나 이상의 검색정보를 주제별 목록 또는 목록별 내용을 각각 텍스트(text)로 변환하여 각각 비교 분석한 후, 상기 분석 결과, 텍스트 중심의 동일한 중복 정보들을 추가로 제거한다.In addition, the integrated search information receiver 150 receives and integrates the information retrieved from the information providing server 300 through keywords input from the plurality of search engines 200, and an Internet address value (URL) in the integrated search information. Duplicate information removing unit 160 to remove the same duplicate information according to whether the same by comparing the uniform resource locator, domain name (domain name), communication protocol (IP address) and the duplicate information removal unit 160 Includes a search information output unit 170 displaying search information from which duplicate information is excluded. In this case, the search information is displayed on the search information output unit 170 by subject or keyword. In this case, the duplicated information removing unit 160 converts the searched one or more searched information into a list of themes or the contents of the list into texts and compares them respectively, and then adds the same duplicated information based on the text. Remove it.

이와 같이 구성된 본 발명에 따른 중복 정보가 제거된 웹사이트 통합 검색 장치의 동작을 첨부한 도면을 참조하여 상세히 설명하면 다음과 같다. 도 2 내지 도 4와 동일한 참조부호는 동일한 기능을 수행하는 동일한 부재를 지칭한다. Referring to the accompanying drawings, the operation of the website integrated retrieval apparatus for removing duplicated information according to the present invention configured as described above will be described in detail as follows. The same reference numerals as in FIGS. 2 to 4 refer to the same members performing the same function.

도 5 는 본 발명의 실시예에 따른 중복 정보가 제거된 웹사이트 통합 검색 방법을 설명하기 위한 흐름도이다.5 is a flowchart illustrating a method for integrated website search in which duplicate information is removed according to an embodiment of the present invention.

도 5를 참조하여 설명하면, 먼저 사용자는 검색어 입력부(110)에 원하는 정보를 찾기 위한 검색어를 입력창(600)에 입력하고, 설정된 복수의 검색엔진(500) 중 적어도 하나 이상을 선택한다(S10). 이때, 상기 검색엔진(200)의 선택은 먼저 대분류된 국내포탈 검색엔진 및 해외포탈 검색엔진(400) 중 어느 하나를 선택한다. 그리고 상기 국내/해외포탈 검색엔진 별로 소분류되어 설정된 검색엔진(500) 중 적어도 하나 이상을 선택한다. 이때, 도 4와 같이 국내포탈 검색엔진(400)을 선택한 경우에는 검색엔진(500)으로 네이버, 엠파스, 파란, 네이트, 다음, 구글 코리아, 야후 코리아 들 등이 소분류로 표시되고, 해외포탈 검색엔진(400)을 선택한 경우에는 검색엔진(500)으로 영어권:Google, Yahoo, Ask, Altavista, 중국어권:sohu.com, qq.com, sina.com, 일본어권:google.co.jp, goo.ne.jp, excite.co.jp 등이 소분류로 표시된다.Referring to FIG. 5, first, a user inputs a search word for searching for desired information in the search word input unit 110 in the input window 600, and selects at least one or more of the plurality of search engines 500 set (S10). ). At this time, the selection of the search engine 200 first selects one of the large-class domestic portal search engine and overseas portal search engine 400. In addition, at least one of the search engines 500 classified and set by the domestic / overseas portal search engines is selected. In this case, when the domestic portal search engine 400 is selected as shown in FIG. 4, the search engine 500 is displayed in small categories such as Naver, Empas, Blue, Nate, Daum, Google Korea, Yahoo Korea, etc. If you select (400), search engines (500) include English: Google, Yahoo, Ask, Altavista, Chinese: sohu.com, qq.com, sina.com, Japanese: google.co.jp, goo.ne .jp, excite.co.jp, etc. are displayed in small categories.

이어 웹사이트 통합 검색 장치(100)는 상기 입력된 검색어를 검색부 분석 부(120)를 통해 형태소 분석한다(S20). 참고로 한국어는 교착어로서 문장의 단위라 할 수 있는 어절이 어휘 형태소와 문법 형태소의 결합으로 이루어져 있으며, 문법 형태소가 문장에서의 문법적 기능을 지시하는 특징을 가지고 있다. 따라서 문장에서 어떤 단어의 문법적 기능을 알기 위해서는 무엇보다도 어절을 형태소 단위로 분리하는 작업이 선행 되어야 한다. 이는 기계번역 또는 정보검색, 혹은 그 밖의 어떠한 한국어 처리를 바탕으로 하는 응용 시스템에 있어서도 필수적으로 요구되는 단계이다. 이때, 분석을 위해 필요한 최소단위가 형태소 이므로 이를‘형태소 분석’이라고 한다. 상기 형태소 분석 방법은 현재 많이 공지된 기술이므로 이에 따른 상세한 설명은 생략한다. Subsequently, the integrated website search apparatus 100 performs morphological analysis on the input search word through the searcher analyzer 120 (S20). For reference, Korean is a deadlock word, which is a unit of sentence. It consists of a combination of lexical morphemes and grammatical morphemes. Therefore, in order to know the grammatical function of a word in a sentence, the task of separating words into morphological units must be preceded. This is an essential step in any application system based on machine translation, information retrieval, or any other Korean language processing. At this time, since the minimum unit necessary for analysis is morpheme, this is called morphological analysis. Since the morphological analysis method is currently well known technology, a detailed description thereof will be omitted.

그리고 이렇게 형태소 분석된 검색어를 기반으로 키워드 생성부(130)는 검색엔진(200)에서의 검색을 위한 키워드를 생성하고, 상기 생성된 키워드와 함께 검색엔진(200)에서 상기 키워드의 입력과 동시에 검색이 자동으로 실행시키기 위한 자동 실행 명령어를 함께 기록한다(S30).The keyword generator 130 generates a keyword for the search in the search engine 200 based on the morphologically analyzed search term, and simultaneously searches for the input of the keyword in the search engine 200 together with the generated keyword. The automatic execution command for automatically executing this is recorded together (S30).

이어 위에서 사용자가 선택한 적어도 하나 이상의 검색엔진(200)의 검색어 입력창에 상기 생성된 키워드가 자동으로 입력되고, 상기 키워드와 함께 기록된 자동 실행 명령어에 의해 상기 키워드를 기반으로 검색 정보를 검출한다(S40).Subsequently, the generated keyword is automatically input into a search word input window of the at least one search engine 200 selected by the user, and the search information is detected based on the keyword by an auto-run command recorded with the keyword ( S40).

상기 자동 실행 명령어의 바람직한 실시예로서 설명하면 다음과 같다. 먼저 모든 검색엔진(200)은 자체적으로 해당 웹페이지가 최초 오픈될 때, 자체 검색어 입력창에 입력 커서가 위치하도록 설정하고 있다. 따라서 프로그램적으로 상기 생성된 키워드를 먼저 카피(copy)한 후, 해당 웹사이트 오픈과 함께 붙여넣기를 수행 한다. 그러면, 오픈되는 웹페이지는 입력 커서가 검색어 입력창에 위치되어 있으므로 붙여넣기 명령을 통해 상기 카피된 키워드가 검색어 입력창에 붙여넣기 된다. 이어 프로그램적으로 엔터(enter) 명령을 수행하도록 함으로서 상기 검색엔진은 상기 키워드에 상응하는 정보 제공서버(300)들을 검색하도록 자동 실행 명령어를 기록한다. 따라서 검색어 송신부(140)는 상기 키워드 생성부(130)에서 자동 실행 명령어가 함께 기록된 키워드를 선택된 검색엔진(200)으로 각각 송신한다. 그러면, 각각의 검색엔진(200)에서 송신된 키워드에 대해 다수의 정보 제공 서버(300)로부터 검색된 내용을 검출하게 된다. If described as a preferred embodiment of the automatic execution command as follows. First, all the search engines 200 are set so that an input cursor is located in their own search box when a corresponding web page is first opened. Therefore, programmatically copy the generated keyword first, and then paste it together with the corresponding website open. Then, since the input cursor is located in the search word input window, the copied keyword is pasted into the search word input window through the paste command. Then, by executing an enter command programmatically, the search engine records an auto-executing command to search for information providing servers 300 corresponding to the keyword. Therefore, the keyword search unit 140 transmits the keywords recorded together with the automatic execution command in the keyword generation unit 130 to the selected search engine 200, respectively. Then, the contents retrieved from the plurality of information providing servers 300 are detected for the keyword transmitted from each search engine 200.

이어 통합검색 정보 수신부(150)는 상기 키워드를 기반으로 적어도 하나 이상의 검색엔진(200)으로부터 각각 송신되는 검색 정보를 모두 통합하여 수신한다(S50).Subsequently, the integrated search information receiving unit 150 integrates and receives all search information transmitted from at least one or more search engines 200 based on the keyword (S50).

그리고 중복정보 제거부(160)를 통해 상기 통합된 검색 정보에서 인터넷 주소값(URL;uniform resource locator), 도메인네임(domain name) 또는 통신프로토콜주소(IP address) 중 적어도 하나를 비교 분석하여 동일 여부에 따라 동일한 중복 정보들을 제거한다(S60). 추가로 상기 인터넷 주소값(URL;uniform resource locator), 도메인네임(domain name) 또는 통신프로토콜주소(IP address) 중 적어도 하나가 동일한 중복 정보들을 하나만 남겨두고 모두 제거한 후, 검색된 하나 이상의 검색정보를 주제별 목록 또는 목록별 내용을 각각 텍스트(text)로 변환하여 각각 비교 분석한다. 그리고 상기 분석 결과, 텍스트 중심의 동일한 중복 정보들을 하나만 남겨두고 모두 제거한다.The duplicate information removing unit 160 compares and analyzes at least one of an internet resource value (URL), a domain name, or a communication protocol address (IP address) in the integrated search information. According to the same duplicate information is removed (S60). In addition, at least one of the uniform resource locator (URL), a domain name, or a communication protocol address (IP address) removes all but the same duplicated information, and then searches for one or more searched information by subject. The list or the contents of each list are converted to text and compared. As a result of the analysis, all the same duplicate information centered on the text is removed except for one.

그리고 이렇게 중복 정보가 배제된 검색 정보를 디스플레이하는 검색 정보 출력부(170)를 통해 화면에 디스플레이 한다(S70). 이때, 상기 검색 정보 출력부(170)에서 검색 정보가 디스플레이되는 방식은 주제별 또는 키워드별로 디스플레이 된다.In addition, the screen is displayed on the screen through the search information output unit 170 displaying search information in which duplicate information is excluded (S70). In this case, the search information is displayed on the search information output unit 170 by subject or keyword.

이에 따라, 사용자는 검색엔진(200)에서 독립적으로 제공되는 지식인, 카페, 미니, 블로그의 자료는 모두 포함하면서도, 검색엔진(200)에서 검색 시 중복되어 이용되는 다수의 정보 제공서버(300)에서 추출된 검색 정보의 경우는 중복되는 정보들이 배제된 후 사용자가 제공받게 됨에 따라, 보다 다양하고 많은 자료를 제공받을 수 있게 된다. 따라서 사용자는 수많은 검색 결과 중에서 원하는 정보를 쉽게 찾을 수 있어 더 많은 결과를 빠르고 편리하게 검색할 수 있게 된다.Accordingly, the user includes all of the intellectuals, cafes, minis, and blogs, which are independently provided by the search engine 200, but in a plurality of information providing servers 300 that are duplicated and used when searching in the search engine 200. In the case of the extracted search information, as the user is provided after the redundant information is excluded, more various and more data can be provided. Therefore, the user can easily find the desired information among numerous search results, so that more results can be searched quickly and conveniently.

상기에서 설명한 본 발명의 기술적 사상은 바람직한 실시예에서 구체적으로 기술되었으나, 상기한 실시예는 그 설명을 위한 것이며 그 제한을 위한 것이 아님을 주의하여야 한다. 또한, 본 발명의 본 발명의 기술적 분야의 통상의 지식을 가진자라면 본 발명의 기술적 사상의 범위 내에서 다양한 실시예가 가능함을 이해할 수 있을 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. Although the technical spirit of the present invention described above has been described in detail in a preferred embodiment, it should be noted that the above-described embodiment is for the purpose of description and not of limitation. In addition, one of ordinary skill in the art of the present invention will understand that various embodiments are possible within the scope of the technical idea of the present invention. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

도 1 은 종래의 검색엔진에서 사용자에게 제공하고 있는 검색방법을 설명하기 위한 구성도1 is a block diagram illustrating a search method provided to a user in a conventional search engine

도 2 는 본 발명의 실시예에 따른 중복 정보가 제거된 웹사이트 통합 검색 장치를 포함하는 전체 구조를 개략적으로 나타낸 구성도2 is a block diagram schematically showing the overall structure including a website integrated retrieval apparatus from which duplicate information is removed according to an embodiment of the present invention.

도 3 는 본 발명의 실시예에 따른 중복 정보가 제거된 웹사이트 통합 검색 장치를 상세히 나타낸 구성도3 is a block diagram showing in detail the integrated website search apparatus removed duplicate information according to an embodiment of the present invention

도 4 는 본 발명의 실시예에 따른 웹사이트 통합 검색된 결과를 나타내는 출력 브라우저 화면을 캡처한 도면4 is a diagram of an output browser screen showing a website integrated search result according to an embodiment of the present invention.

도 5 는 본 발명의 실시예에 따른 중복 정보가 제거된 웹사이트 통합 검색 방법을 설명하기 위한 흐름도5 is a flowchart illustrating a method for integrating and searching a website in which duplicate information is removed according to an embodiment of the present invention.

*도면의 주요부분에 대한 부호의 설명DESCRIPTION OF THE REFERENCE NUMERALS

100 : 웹사이트 통합검색 장치 110 : 검색어 입력부100: integrated website search device 110: search term input unit

120 : 검색어 분석부 130 : 키워드 생성부120: search term analysis unit 130: keyword generation unit

140 : 검색어 송신부 150 : 통합검색 정보 수신부140: search term transmitter 150: integrated search information receiver

160 : 중복정보 제거부 170 : 검색정보 출력부160: duplicate information removal unit 170: search information output unit

200 : 검색엔진 300 : 정보 제공서버200: search engine 300: information providing server

400 : 대분류 500 : 소분류400: large category 500: small category

600 : 입력창600: input window

Claims

A search input unit for inputting a search word for searching for information through at least one selected search engine;

A search word analysis unit which performs morphological analysis of the input search word;

A keyword generation unit which is reconstructed by the morphological analysis to generate a keyword for searching in a search engine, and records an automatic execution command in the generated keyword;

A search word transmitter to receive a keyword recorded with the auto-execution command and transmit the received keyword to the selected at least one search engine;

An integrated search information receiving unit which receives and integrates the information searched by the information providing server through the keywords respectively input by the at least one search engine;

A duplicate information removing unit for removing duplicate information from the integrated search information;

And a search information output unit for displaying the search information in which the duplicate information is excluded.

The method of claim 1,

The duplicated information removing unit compares and analyzes at least one of a uniform resource locator (URL), a domain name, or a communication protocol address (IP address) in the integrated search information to determine the same duplicated information according to whether the same. The integrated website search device is removed, characterized in that for removing duplicate information.

The method of claim 2,

The duplicated information removing unit converts the searched one or more searched information into a text for each subject or list, and compares and analyzes them respectively, and then additionally removes the same duplicate information centered on the text. Website integrated search device that has been removed duplicate information.

The method of claim 1,

And the automatic execution command sets the search engine to automatically search the information providing server according to the keyword.

The method of claim 1,

The search engine is a domestic portal search engine including at least one of Naver, Empas, Blue, Nate, Daum, Google Korea, Yahoo Korea, and Google, Yahoo, Ask, Altavista, sohu.com, qq.com, sina.com, An integrated website search apparatus for removing duplicate information, comprising an overseas portal search engine including at least one of google.co.jp, goo.ne.jp, and excite.co.jp.

The method of claim 1,

The integrated search information receiving unit integrates the information retrieved from the information providing server which is at least one of a website, a dictionary, a knowledge search, a cafe, a blog, an image, a video, an engraving, a news, a specialized document, and a web page through a search engine. Website integrated retrieval device with duplicate information removed.

(A) entering a search term in the search term input window and selecting at least one of a plurality of set search engines;

(B) stemming the input search word;

(C) an automatic execution command for generating a keyword for a search in a selected search engine based on the morphologically analyzed search term and automatically executing the search simultaneously with the input of the keyword in the search engine together with the generated keyword; Recording them together;

(D) integrating and receiving search information detected through at least one or more search engines selected based on the keyword;

(E) removing duplicate information from the integrated search information,

And (F) displaying the search information on which the duplicate information is excluded on the screen.

The method of claim 7, wherein step (A)

Selecting one of the major domestic portal search engines and foreign portal search engines;

Selecting at least one of the domestic search engines which are classified into predetermined categories when the domestic portal search engine is selected as a result of the selection;

And selecting at least one of sub-classified and preset overseas search engines when selecting an overseas portal search engine as a result of the selection,

In this case, the domestic search engine includes at least one of Naver, Empas, Blue, Nate, Daum, Google Korea, Yahoo Korea, and the overseas search engine is Google, Yahoo, Ask, Altavista, sohu.com, qq.com, Integrated search method for removing duplicate information, characterized in that it comprises at least one of sina.com, google.co.jp, goo.ne.jp, excite.co.jp.

The method of claim 8,

The sub-category included in the domestic portal search engine and the overseas portal search engine can be changed by the user, the integrated website search method is removed duplicate information.

The method of claim 7, wherein step (E)

Comparing and analyzing at least one of a uniform resource locator (URL), a domain name, or an IP address in the integrated search information;

As a result of the analysis, at least one of an Internet resource value (URL), a domain name, or a communication protocol address (IP address) may include removing all of the same information, leaving only one duplicate information. How to consolidate website retrieval with duplicate information removed.

The method of claim 10, wherein step (E)

After at least one of the uniform resource locator (URL), a domain name, or a communication protocol address (IP address) removes all but the same duplicate information,

Converting one or more searched information into a list of subjects or the contents of the list into text and comparing and analyzing each of them;

And removing all of the duplicated information centered on text as a result of the analysis.