KR101544141B1

KR101544141B1 - System for grouping articles based on subject thereof

Info

Publication number: KR101544141B1
Application number: KR1020130119444A
Authority: KR
Inventors: 이경일; 최광선; 이장훈
Original assignee: 주식회사 솔트룩스
Priority date: 2013-10-07
Filing date: 2013-10-07
Publication date: 2015-08-12
Also published as: KR20150040658A

Abstract

기사 그룹화 시스템이 개시된다. 본 발명의 예시적 실시예에 따른 기사 그룹화 시스템은 데이터 베이스에 저장된 복수개의 기사들의 기사 주제들을 추출하는 주제 추출부, 상기 기사 주제들에 기초하여 상기 복수개의 기사들을 그룹화하여 제1 그룹을 생성하는 그룹 생성부 및 상기 제1 그룹을 사용자에게 제공하는 그룹 제공부를 포함할 수 있다.An article grouping system is disclosed. An article grouping system according to an exemplary embodiment of the present invention includes a subject extracting unit for extracting article topics of a plurality of articles stored in a database, a grouping unit for grouping the plurality of articles based on the article topics, And a group providing unit for providing the first group to a user.

Description

[0001] The present invention relates to a system for grouping articles,

본 발명의 기술적 사상은 기사 그룹화 시스템에 관한 것으로서, 자세하게는 기사로부터 도출된 주제를 이용하여 복수개의 기사들을 그룹화하여 사용자에게 제공하는 주제 기반 기사 그룹화 시스템에 관한 것이다.The technical idea of the present invention relates to an article grouping system, and more particularly, to a topic-based article grouping system for grouping a plurality of articles using a topic derived from an article and providing the same to a user.

본 발명은 지식경제부 산업원천기술개발사업의 일환으로 숭실대학교 산학협력단에서 주관하고 (주)솔트룩스에서 공동연구하여 수행된 연구로부터 도출된 것이다. [연구기간: 2013.03.01~2014.02.28, 연구과제명: 모바일 플랫폼 기반 계획 및 학습 인지 모델 프레임워크 기술 개발, 과제번호: 10035348]The present invention is derived from the research conducted by Soongsil university industry-academy cooperation team as a part of the Ministry of Knowledge Economy's industrial technology development project and joint research conducted by Saltlux Co., Ltd. [Research period: 2013.03.01 ~ 2014.02.28, Research title: Development of mobile platform based planning and learning cognitive model framework technology, task number: 10035348]

인터넷의 발달로 인하여, 예컨대 온라인 커뮤니티, 영화, 음악, 사진, 문서 등을 포함하는 다양한 컨텐츠들이 인터넷을 통해서 제공되고 있다. 컨텐츠의 일종으로서 종래 텔레비전, 라디오, 잡지 또는 신문등을 통해서 제공되던 기사 또한 인터넷을 통해서 제공되고 있으며, 인터넷의 접근성 및 신속성으로 인하여 많은 인터넷 사용자들은 인터넷을 통해서 제공되는 기사를 열람하고 있다.Due to the development of the Internet, various contents including an online community, movies, music, photographs, documents, and the like are being provided through the Internet. As a kind of contents, articles that have been provided through conventional television, radio, magazine or newspaper are also provided through the Internet. Due to the accessibility and speed of the Internet, many Internet users are browsing articles provided through the Internet.

한편, 기존 언론 매체들이 인터넷을 통해서 기사를 공급할 뿐만 아니라 인터넷을 기반으로 하는 언론 매체들이 등장하고 있으며, 인터넷을 통해서 기사를 제공하는 언론 매체의 수는 증가하는 추세에 있다. 인터넷을 통해 기사를 제공하는 언론 매체의 수가 증가함에 따라 짧은 시간에 생성된 방대한 양의 기사들이 인터넷 사용자에게 제공되고 있다. 이에 따라, 많은 수의 기사들 중 인터넷 사용자가 원하는 기사를 쉽게 열람할 수는 수단이 필요하다. On the other hand, existing media are not only providing articles through the Internet, but also media-based media are emerging, and the number of media providing articles through the Internet is increasing. As the number of media that provide articles through the Internet grows, the vast amount of articles produced in a short time are being offered to Internet users. Accordingly, a means for easily browsing the articles desired by the Internet users among a large number of articles is needed.

본 발명의 기술적 사상은 기사 그룹화 시스템에 관한 것으로서, 인터넷을 통해서 제공되는 기사들의 주제를 추출하고, 추출된 주제에 따라 기사들을 그룹화하여 사용자에게 제공하는 주제 기반 기사 그룹화 시스템에 관한 것이다.The technical idea of the present invention relates to an article grouping system, and more particularly, to a topic-based article grouping system for extracting a topic of articles provided through the Internet, grouping articles according to extracted topics, and providing the article to a user.

상기와 같은 목적을 달성하기 위하여, 본 발명의 예시적 실시예에 따른 기사 그룹화 시스템은 데이터 베이스에 저장된 복수개의 기사들의 기사 주제들을 추출하는 주제 추출부, 상기 기사 주제들에 기초하여 상기 복수개의 기사들을 그룹화하여 제1 그룹을 생성하는 그룹 생성부 및 상기 제1 그룹을 네트워크를 통하여 사용자에게 제공하는 그룹 제공부를 포함할 수 있다.In order to achieve the above object, an article grouping system according to an exemplary embodiment of the present invention includes a subject extracting unit for extracting article topics of a plurality of articles stored in a database, And a group providing unit for providing the first group to a user through a network.

본 발명의 예시적 실시예에 따라, 상기 기사 주제들 각각은 적어도 하나의 단어를 포함할 수 있고, 상기 주제 추출부는 상기 복수개의 기사들 각각으로부터 적어도 하나의 주요 단어를 추출하는 단어 추출부 및 상기 주요 단어에 기초하여 상기 복수개의 기사들 각각의 기사 주제를 생성하는 주제 생성부를 포함할 수 있다.According to an exemplary embodiment of the present invention, each of the article topics may include at least one word, and the subject extracting unit may include a word extracting unit for extracting at least one main word from each of the plurality of articles, And a topic generating unit for generating an article topic of each of the plurality of articles based on the main word.

본 발명의 예시적 실시예에 따라, 상기 단어 추출부는 상기 복수개의 기사들 각각에 포함된 단어의 빈도에 기초하여 상기 주요 단어를 추출할 수 있다.According to an exemplary embodiment of the present invention, the word extracting unit may extract the main word based on the frequency of words included in each of the plurality of articles.

본 발명의 예시적 실시예에 따라, 상기 그룹 생성부는 기사 상기 주제들 사이의 제1 유사도를 연산하는 제1 유사도 연산부, 상기 제1 유사도에 기초하여 중복된 기사를 제거하고 상기 제1 그룹을 생성하는 중복 기사 처리부 및 상기 제1 유사도에 기초하여 상기 제1 그룹의 그룹 주제를 생성하는 그룹 주제 생성부를 포함할 수 있다.According to an exemplary embodiment of the present invention, the group generating section includes a first similarity degree calculating section for calculating a first degree of similarity between the articles, a second similarity degree calculating section for removing redundant articles based on the first degree of similarity, And a group subject generating unit for generating the group subject of the first group based on the first similarity.

본 발명의 예시적 실시예에 따라, 상기 중복 기사 처리부는 상기 제1 그룹의 그룹 주제와 상기 데이터 베이스에 저장된 제1 기사의 기사 주제 사이의 유사도에 기초하여 상기 제1 기사를 상기 제1 그룹에 선택적으로 포함시킬 수 있다.According to an exemplary embodiment of the present invention, the redundant-article processing unit is configured to assign the first article to the first group based on the similarity between the group topic of the first group and the article subject of the first article stored in the database And can be optionally included.

본 발명의 예시적 실시예에 따라, 상기 중복 기사 처리부는 상기 제1 유사도 및 상기 제1 그룹에 포함되는 기사들의 개수에 기초하여 상기 제1 그룹을 2개 이상의 그룹들로 분할할 수 있다.According to an exemplary embodiment of the present invention, the redundant article processing unit may divide the first group into two or more groups based on the first degree of similarity and the number of articles included in the first group.

본 발명의 예시적 실시예에 따라, 상기 기사 그룹화 시스템은 상기 제1 그룹에 포함되는 기사들의 기사 주제들 및 상기 제1 그룹의 그룹 주제에 기초하여 상기 제1 그룹에 포함된 적어도 하나의 대표 기사를 식별하는 대표 기사 식별부를 더 포함할 수 있다.According to an exemplary embodiment of the present invention, the article grouping system is configured to include article stories of articles included in the first group and at least one representative article included in the first group based on the first group of group topics And a representative article identification unit for identifying the representative article.

본 발명의 예시적 실시예에 따라, 상기 대표 기사 식별부는 상기 제1 그룹에 포함되는 기사들의 기사 주제들 및 상기 제1 그룹의 그룹 주제 사이의 제2 유사도를 연산하는 제2 유사도 연산부 및 상기 제2 유사도에 기초하여 상기 제1 그룹에 포함되는 기사들의 순위를 결정하는 기사 랭킹부를 포함할 수 있다.According to an exemplary embodiment of the present invention, the representative article identifying unit includes a second similarity calculating unit for calculating a second similarity degree between the article subjects of the articles included in the first group and the group topic of the first group, 2 < / RTI > degree of similarity of the articles included in the first group.

본 발명의 예시적 실시예에 따라, 상기 그룹 제공부는 상기 제1 그룹을 포함하는 복수개의 그룹들을 상기 사용자에게 제공하는 주제별 그룹 제공부 및 상기 제1 그룹에 포함된 대표 기사를 상기 사용자에게 제공하는 대표 기사 제공부를 포함할 수 있다.According to an exemplary embodiment of the present invention, the group providing unit may include a topic grouping providing unit for providing a plurality of groups including the first group to the user, and a providing unit for providing a representative article included in the first group to the user And a representative article providing unit.

본 발명의 예시적 실시예에 따라, 상기 대표 기사 제공부는 상기 제1 그룹에 포함된 기사들에 대한 상기 사용자의 조회수들을 측정하는 기사 조회수 측정부를 더 포함할 수 있고, 상기 기사 랭킹부는 상기 조회수들에 더 기초하여 상기 순위를 결정할 수 있다.According to an exemplary embodiment of the present invention, the representative article providing unit may further include an article hits measurement unit that measures the hits of the user with respect to the articles included in the first group, To determine the ranking.

상기 기사 그룹화 시스템에 따르면, 사용자는 방대한 양의 기사들로부터 자신이 원하는 기사를 용이하게 열람할 수 있다. 또한, 사용자는 주요 사건들을 한눈에 확인할 수 있고, 동일한 주제와 관련된 복수개의 기사들을 용이하게 열람할 수 있다.According to the article grouping system, a user can easily browse articles he wants from a vast amount of articles. In addition, the user can confirm important events at a glance, and can easily browse a plurality of articles related to the same topic.

도 1은 본 발명의 예시적 실시예에 따른 기사 그룹화 시스템의 일 예를 나타내는 도면이다.
도 2는 본 발명의 예시적 실시예에 따른 기사 그룹화 시스템의 구성을 나타내는 도면이다.
도 3은 본 발명의 예시적 실시예에 따라 도 1의 주제 추출부의 구현예를 나타내는 도면이다.
도 4는 본 발명의 예시적 실시예에 따른 도 3의 단어 추출부의 동작을 나타내는 도면이다.
도 5는 본 발명의 예시적 실시예에 따라 도 1의 그룹 생성부의 구현예를 나타내는 도면이다.
도 6a 및 6b는 본 발명의 예시적 실시예들에 따라 도 1의 그룹 생성부가 기사들을 그룹화하는 동작들을 나타내는 도면이다.
도 7은 본 발명의 예시적 실시예에 따라 도 1의 대표 기사 식별부의 구현예를 나타내는 도면이다.
도 8은 본 발명의 예시적 실시예에 따라 도 1의 그룹 제공부의 구현예를 나타내는 도면이다.
도 9는 본 발명의 예시적 실시예에 따라 도 1의 그룹 제공부가 사용자에게 제공하는 그룹들을 나타내는 도면이다.
도 10은 본 발명의 예시적 실시예에 따라 도 1의 그룹 제공부가 사용자에게 제공하는 기사들을 나타내는 도면이다.1 is a diagram illustrating an example of an article grouping system according to an exemplary embodiment of the present invention.
2 is a diagram showing the configuration of an article grouping system according to an exemplary embodiment of the present invention.
3 is a diagram showing an embodiment of the subject extracting unit of FIG. 1 according to an exemplary embodiment of the present invention.
4 is a diagram showing the operation of the word extracting unit of FIG. 3 according to an exemplary embodiment of the present invention.
5 is a diagram showing an embodiment of the group generating unit of FIG. 1 according to an exemplary embodiment of the present invention.
Figures 6A and 6B are diagrams illustrating operations for grouping articles in Figure 1 in accordance with exemplary embodiments of the present invention.
Figure 7 is a diagram illustrating an embodiment of the representative article identifier of Figure 1 in accordance with an exemplary embodiment of the present invention.
Figure 8 is a diagram illustrating an implementation of the group providing unit of Figure 1 in accordance with an exemplary embodiment of the present invention.
9 is a diagram illustrating groups that the group providing unit of FIG. 1 provides to a user in accordance with an exemplary embodiment of the present invention.
FIG. 10 is a diagram showing articles provided by the group providing section of FIG. 1 to a user according to an exemplary embodiment of the present invention.

이하에서는 본 발명의 바람직한 실시예가, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 철저한 이해를 제공할 의도 외에는 다른 의도 없이, 첨부한 도면들을 참조로 하여 상세히 설명될 것이다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings without intending to intend to provide a thorough understanding of the present invention to a person having ordinary skill in the art to which the present invention belongs.

도 1은 본 발명의 예시적 실시예에 따른 기사 그룹화 시스템의 일 예를 나타내는 도면이다. 도 1에 도시된 바와 같이, 기사 그룹화 시스템(10), 복수개의 데이터 베이스들(22, 24) 및 사용자(30)는 네트워크(40)를 통해서 서로 데이터를 주고 받을 수 있다. 사용자(30)는 네트워크(40)를 통해서 기사 그룹화 시스템(10)에 접속할 수 있다. 예컨대, 사용자(30)는 기사 그룹화 시스템(10)이 제공하는 기사들의 그룹을 제공받을 수 있다. 기사 그룹화 시스템(10)은 네트워크(40)를 통해서 데이터 베이스들(22, 24)에 접속할 수 있다.1 is a diagram illustrating an example of an article grouping system according to an exemplary embodiment of the present invention. As shown in FIG. 1, the article grouping system 10, the plurality of databases 22, 24, and the user 30 can exchange data with each other via the network 40. The user 30 may access the article grouping system 10 via the network 40. [ For example, the user 30 may be provided with a group of articles provided by the article grouping system 10. The article grouping system 10 may connect to the databases 22, 24 via the network 40.

데이터 베이스들(22, 24)은 복수개의 기사들을 저장할 수 있다. 예컨대, 제1 데이터 베이스(22)는 제1 언론 매체가 작성한 복수개의 기사들을 저장할 수 있고, 제2 데이터 베이스(24)는 제1 언론 매체와 다른 제2 언론 매체가 작성한 복수개의 기사들을 저장할 수 있다. 제1 및 제2 언론 매체가 새롭게 작성한 기사들은 제1 및 제2 데이터 베이스(22, 24) 각각에 업데이트될 수 있다. 한편, 사용자(30)는 네트워크(40)를 통해서 기사를 열람할 수 있고, 네트워크(40)에 접속할 수 있는 단말기, 예컨대 퍼스널 컴퓨터, 휴대용 통신 기기 등을 이용할 수 있다.The databases 22, 24 may store a plurality of articles. For example, the first database 22 may store a plurality of articles created by the first media, and the second database 24 may store a plurality of articles created by a second media other than the first media. have. Articles newly created by the first and second media can be updated in the first and second databases 22 and 24, respectively. On the other hand, the user 30 can browse through the network 40 and can use a terminal capable of connecting to the network 40, such as a personal computer or a portable communication device.

도 1에 도시된 바와 같이, 기사 그룹화 시스템(10)은 주제 추출부(100), 그룹 생성부(200), 그룹 제공부(300) 및 대표 기사 식별부(400)를 포함할 수 있다. 본 발명의 예시적 실시예에 따라, 주제 추출부(100)는 네트워크(40)를 통해서 데이터 베이스들(22, 24)에 저장된 기사들을 엑세스할 수 있고, 기사들의 기사 주제를 추출할 수 있다. 그룹 생성부(200)는 주제 추출부(100)가 추출한 기사 주제에 기초하여 기사들을 그룹화하여 제1 그룹을 생성할 수 있다. 그룹 제공부(300)는 그룹 생성부가 생성시킨 제1 그룹 및 제1 그룹에 포함되는 기사를 네트워크(40)를 통해서 사용자(30)에게 제공할 수 있다. 대표 기사 식별부(400)는 그룹 생성부(200)가 생성시킨 제1 그룹에 포함된 기사들 가운데 적어도 하나의 대표 기사를 식별할 수 있고, 식별된 대표 기사 또는 대표 기사에 대한 정보를 그룹 제공부(300)에 전달할 수 있다. 그룹 제공부(300)는 제1 그룹과 함께 제1 그룹의 대표 기사를 네트워크(40)를 통해서 사용자(30)에게 제공할 수 있다.1, the article grouping system 10 may include a subject extracting unit 100, a group generating unit 200, a group providing unit 300, and a representative article identifying unit 400. [ According to an exemplary embodiment of the present invention, the topic extraction unit 100 can access articles stored in the databases 22 and 24 via the network 40 and extract article topics of articles. The group generating unit 200 may generate the first group by grouping the articles based on the article subject extracted by the topic extracting unit 100. [ The group providing unit 300 may provide the user 30 with the articles included in the first group and the first group generated by the group generating unit through the network 40. [ The representative article identification unit 400 may identify at least one representative article among the articles included in the first group generated by the group generation unit 200 and may provide information on the identified representative article or representative article to the group (300). The group provider 300 may provide the first group of representative articles with the first group to the user 30 via the network 40. [

도 2는 본 발명의 예시적 실시예에 따른 기사 그룹화 시스템의 구성을 나타내는 도면이다. 도 2에 도시된 바와 같이, 기사 그룹화 시스템(10)은 주제 추출부(100), 그룹 생성부(200), 그룹 제공부(300) 및 대표 기사 식별부(400)가 접근할 수 있는 내부 데이터 베이스(500)를 포함할 수 있다. 도 1을 함께 참조하면, 내부 데이터 베이스(500)는 언론 매체들이 기사를 저장하는 데이터 베이스들(22, 24)에 저장된 기사들 중 일부, 주제 추출부(100)가 추출한 기사들의 주제, 그룹 생성부(200)가 저장한 복수개의 그룹들 또는 그룹들에 대한 정보를 저장할 수 있다. 예컨대, 그룹 생성부(200)는 이미 생성된 제1 그룹을 내부 데이터 베이스(500)에 저장할 수 있고, 내부 데이터 베이스(500)에 저장된 제1 그룹에 기초하여 외부의 데이터 베이스들(22, 24)에 저장된 기사들이 제1 그룹에 포함되는지 여부를 판단할 수 있다.2 is a diagram showing the configuration of an article grouping system according to an exemplary embodiment of the present invention. 2, the article grouping system 10 may include internal data that the subject extracting unit 100, the group generating unit 200, the group providing unit 300, and the representative article identifying unit 400 can access, And may include a base 500. Referring to FIG. 1, the internal database 500 includes a part of articles stored in the databases 22 and 24 storing the articles, a subject of the articles extracted by the subject extracting unit 100, And information about a plurality of groups or groups stored by the unit 200 can be stored. For example, the group generating unit 200 may store the already generated first group in the internal database 500 and may store the first group in the external database 22, 24 May be included in the first group.

도 3은 본 발명의 예시적 실시예에 따라 도 1의 주제 추출부(100)의 구현예를 나타내는 도면이다. 도 3에 도시된 바와 같이, 주제 추출부(100)는 단어 추출부(120) 및 주제 생성부(140)를 포함할 수 있다. 도 1을 함께 참조하면, 데이터 베이스들(22, 24)에 저장된 기사로부터 추출되는 기사 주제는 적어도 하나의 단어를 포함할 수 있다. 단어 추출부(120)는 기사에 포함된 단어들 가운데 적어도 하나의 주요 단어를 추출할 수 있다. 예컨대, 단어 추출부(120)는 기사에 포함된 단어의 빈도에 기초하여 주요 단어를 추출하거나, 제목에 포함된 단어를 주요 단어로서 추출할 수 있다. 이를 위하여, 단어 추출부(120)는 문장의 구조를 분석하여 개별적인 단위로 분류할 수 있는 파서(parser)를 포함할 수 있다. 단어 추출부(120)는 파서가 분류한 단위들에 기초하여 단어의 빈도를 카운트할 수 있다. 이때, 문장에 포함된 부수적인 요소들, 예컨대 조사 및 관용어 등은 배제될 수 있다.FIG. 3 is a diagram showing an embodiment of the subject extracting unit 100 of FIG. 1 according to an exemplary embodiment of the present invention. As shown in FIG. 3, the subject extracting unit 100 may include a word extracting unit 120 and a subject generating unit 140. With reference to FIG. 1, the article subject extracted from the articles stored in the databases 22, 24 may include at least one word. The word extracting unit 120 may extract at least one main word among the words included in the article. For example, the word extracting unit 120 may extract a main word based on the frequency of the word included in the article, or extract a word included in the title as a main word. For this, the word extracting unit 120 may include a parser for analyzing the structure of a sentence and classifying the sentence into individual units. The word extracting unit 120 may count the frequency of words based on the units classified by the parser. At this time, the ancillary elements included in the sentence, such as survey and idioms, can be excluded.

주제 생성부(140)는 단어 추출부(120)가 추출한 주요 단어에 기초하여 기사의 기사 주제를 생성할 수 있다. 본 발명의 예시적 실시예에 따라, 주제 생성부(140)는 단어 추출부(120)가 추출한 주요 단어들을 분석하여 미리 정해진 개수의 단어들을 미리 정해진 순서로서 배열하여 기사 주제를 생성할 수 있다. 예컨대, 주제 생성부(140)는 단어 추출부(120)로부터 인물의 실명이 주요 단어로서 추출된 경우, 상기 실명이 기사 주제의 가장 앞에 위치하도록 주요 단어들을 배열할 수 있다. 또한, 주제 생성부(140)는 기사가 포함하는 단어의 빈도에 기초하여 단어 벡터들을 생성할 수 있다. 단어 벡터에 대한 자세한 내용은 후술한다.The topic generation unit 140 may generate an article topic of an article based on the main words extracted by the word extraction unit 120. [ According to an exemplary embodiment of the present invention, the subject generation unit 140 may generate an article topic by analyzing key words extracted by the word extraction unit 120 and arranging a predetermined number of words in a predetermined order. For example, when the real name of the person is extracted as the main word from the word extracting unit 120, the subject generating unit 140 may arrange the main words such that the real name is located at the front of the article subject. In addition, the topic generation unit 140 may generate word vectors based on the frequency of words included in an article. Details of the word vector will be described later.

한편, 본 발명의 예시적 실시예에 따라 주제 생성부(140)는 데이터 베이스들(22, 24)에 저장된 기사의 분야에 따라 기사 주제를 분류할 수 있다. 언론 매체들은 데이터 베이스들(22, 24)에 기사들을 저장할 때 기사의 분야, 예컨대 정치, 경제, 사회, 스포츠 또는 연예등을 분류하여 저장할 수 있다. 이에 따라, 주제 생성부(140)는 데이터 베이스들(22, 24)에 저장된 기사의 분야를 나타내는 단어를 기사 주제에 포함시킬 수 있다. 기사 주제에 포함된 기사의 분야를 나타내는 단어는 서로 다른 분야의 기사들이 동일한 그룹으로 그룹화되는 것을 방지할 수 있다.Meanwhile, according to an exemplary embodiment of the present invention, the subject creation unit 140 may classify the article subject according to the field of the articles stored in the databases 22 and 24. [ The media can categorize and store the fields of the article, such as politics, economy, society, sports or entertainment, when storing articles in the databases 22 and 24. Accordingly, the subject creation unit 140 may include a word indicating the field of the article stored in the databases 22 and 24 in the article subject. Words representing fields of an article included in an article subject can prevent articles in different fields from being grouped into the same group.

도 4는 본 발명의 예시적 실시예에 따른 도 3의 단어 추출부(120)의 동작을 나타내는 도면이다. 도 4에 도시된 2개의 표들은 각각 2개 기사를 분석한 데이터를 나타낸다. 도 3에서 설명한 바와 같이, 단어 추출부(120)는 기사에 포함된 단어의 빈도에 기초하여 주요 단어를 추출할 수 있다. 또한, 제목에 포함된 단어를 주요 단어로서 추출할 수 있다. 도 4에 도시된 표에서 좌측 열은 단어를 나타내고, 가운데 열은 기사에 포함된 단어의 수를 나타내며, 우측 열은 기사의 제목에 단어가 포함되었는지 여부를 나타낸다. 도 4는 각각의 기사에서 5개의 단어에 대한 빈도 및 제목에 포함 여부만을 도시하였으나, 본 발명의 예시적 실시예는 이에 제한되지 않는다.FIG. 4 is a diagram showing the operation of the word extracting unit 120 of FIG. 3 according to an exemplary embodiment of the present invention. The two tables shown in FIG. 4 represent data obtained by analyzing two articles, respectively. 3, the word extracting unit 120 may extract the main word based on the frequency of words included in the article. In addition, a word included in the title can be extracted as a main word. In the table shown in FIG. 4, the left column represents a word, the middle column represents the number of words included in the article, and the right column represents whether or not a word is included in the title of the article. FIG. 4 shows only the frequency and title for five words in each article, but the exemplary embodiment of the present invention is not limited thereto.

도 4에 도시된 바와 같이, 제1 기사(A1)에서 “홍길동”은 5회, “PGA”및 “우승”은 각각 3회, “캘리포니아” 및 “골프”는 각각 1회 등장한다. 또한, “홍길동”, “PGA” 및 “우승”은 제1 기사(A1)의 제목에 포함되어 있다. 한편, 제2 기사(A2)에서 “홍길동”은 6회, “PGA”는 4회, “4회”는 2회, “캘리포니아”는 1회 등장하며, “홍길동”, “우승”, “PGA” 및 “4회”는 제2 기사(A2)의 제목에 포함되어 있다.As shown in Fig. 4, "Hong Gil Dong" appears five times in the first article A1, "PGA" and "Championship" appear three times, and "California" and "Golf" appear once each. In addition, "Hong Gil Dong", "PGA" and "championship" are included in the title of the first article (A1). On the other hand, in the second article (A2), "Hong Gil Dong", "PGA", "PGA", "PGA" &Quot; and " 4 times " are included in the title of the second article (A2).

본 발명의 예시적 실시예에 따라, 단어 추출부(120)는 제1 기사(A1) 및 제2 기사(A2)로부터 도 4에 도시된 표들과 같은 데이터를 생성할 수 있고, 생성된 데이터에 기초하여 제1 기사(A1) 및 제2 기사(A2)의 주요 단어를 각각 추출할 수 있다. 예컨대, 단어 추출부(120)는 기사의 제목에 포함된 단어를 주요 단어로 추출하고, 빈도 상위 3개 또는 그 이상의 의 단어를 주요 단어로 추출할 수 있다. 즉, 도 4에 도시된 예시에서, 단어 추출부(120)는 제1 기사(A1)의 주요 단어로서 “홍길동”, “PGA” 및 “우승”은 주요 단어로서 추출할 수 있다. 또한, 단어 추출부(120)는 제2 기사(A2)의 주요 단어로서 “홍길동”, “우승”, “PGA” 및 “4회”를 주요 단어로서 추출할 수 있다. According to an exemplary embodiment of the present invention, the word extracting unit 120 can generate data such as the tables shown in Fig. 4 from the first article A1 and the second article A2, The main words of the first article (A1) and the second article (A2) can be respectively extracted. For example, the word extracting unit 120 may extract a word included in the title of an article as a main word, and extract a word having three or more frequencies as a main word. In other words, in the example shown in FIG. 4, the word extracting unit 120 can extract "Hong Gi-dong", "PGA" and "win" as main words of the first article A1 as main words. Further, the word extracting unit 120 can extract "Hong Gil-Dong", "Win", "PGA" and "4 times" as main words of the second article (A2).

도 5는 본 발명의 예시적 실시예에 따라 도 1의 그룹 생성부(200)의 구현예를 나타내는 도면이다. 도 5에 도시된 바와 같이, 그룹 생성부(200)는 제1 유사도 연산부(220), 중복 기사 처리부(240) 및 그룹 주제 생성부(260)를 포함할 수 있다. 도 3을 함께 참조하면, 제1 유사도 연산부(220)는 주제 추출부(100)의 주제 생성부(140)가 생성한 복수개의 기사들에 대한 기사 주제들에 기초하여, 기사들 간의 제1 유사도를 연산할 수 있다. 제1 유사도 연산부(220)가 기사들 간의 제1 유사도를 연산하는 것에 대한 자세한 내용은 후술한다.FIG. 5 is a diagram illustrating an embodiment of the group generator 200 of FIG. 1 according to an exemplary embodiment of the present invention. 5, the group generation unit 200 may include a first similarity degree calculation unit 220, a redundancy processing unit 240, and a group topic generation unit 260. Referring to FIG. 3, the first similarity degree calculating unit 220 calculates a first similarity degree between the articles based on the article topics for the plurality of articles generated by the topic generating unit 140 of the topic extracting unit 100, Can be calculated. The details of the first degree-of-similarity calculation unit 220 calculating the first degree of similarity between the articles will be described later.

중복 기사 처리부(240)는 제1 유사도 연산부(220)가 연산한 제1 유사도에 기초하여 중복된 기사들을 처리하여 제1 그룹을 생성할 수 있다. 중복 기사 처리부(240)는 제1 유사도에 기초하여 유사한 주제를 구비하는 기사들을 그룹화할 수 있다. 또한, 중복 기사 처리부(240)는 그룹화된 기사들 가운데 중복되는 기사들을 제거함으로써 제1 그룹을 생성할 수 있다.The redundant article processing unit 240 may process the overlapped articles based on the first degree of similarity calculated by the first similarity degree calculating unit 220 to generate the first group. The redundant article processing unit 240 may group articles having a similar subject based on the first degree of similarity. In addition, the redundant article processing unit 240 may generate the first group by removing overlapping articles among the grouped articles.

사용자(30)는 하나의 사건에 대해서, 동일한 내용을 포함하는 기사들을 열람하기보다 동일한 사건에 대해 서로 다른 내용을 포함하는 기사들을 열람하기를 원할 수 있다. 따라서, 중복 기사 처리부(240)는 제1 유사도 연산부(220)가 연산한 제1 유사도에 기초하여, 중복되는 기사들을 제거할 수 있다. 예컨대, 중복 기사 처리부(240)는 제1 유사도가 미리 정해진 문턱값을 초과하는 경우, 둘 중 하나의 기사만을 제1 그룹에 포함시킬 수 있다.The user 30 may want to view articles containing different content for the same event, rather than viewing articles containing the same content, for an event. Accordingly, the duplicate-article processing unit 240 can remove duplicate articles based on the first similarity degree calculated by the first similarity degree calculating unit 220. [ For example, when the first degree of similarity exceeds a predetermined threshold value, the redundant article processing unit 240 may include only one of the two articles in the first group.

또한, 중복 기사 처리부(240)는 제1 그룹에 포함되는 기사의 개수에 기초하여 제1 그룹을 2개 이상의 하위 그룹으로 분할할 수 있다. 예컨대, 제1 유사도 연산부(220)는 제1 그룹에 포함되는 기사의 개수가 미리 정해진 개수를 초과하는 경우, 제1 그룹에 포함된 기사들의 유사도에 기초하여 제1 그룹을 2개 이상의 그룹들로 분할 할 수 있다. 중복 기사 처리부(240)가 생성한 제1 그룹에 포함된 기사들은 서로 동일 또는 유사한 주제를 갖는 기사들로서 사용자(30)에게 제공될 수 있다.In addition, the redundant article processing unit 240 may divide the first group into two or more subgroups based on the number of articles included in the first group. For example, when the number of articles included in the first group exceeds a predetermined number, the first similarity degree computing section 220 may divide the first group into two or more groups based on the degree of similarity of the articles included in the first group Can be divided. The articles included in the first group generated by the duplicate article processing unit 240 may be provided to the user 30 as articles having the same or similar topic to each other.

그룹 주제 생성부(260)는 중복 기사 처리부(240)가 생성한 제1 그룹의 그룹 주제를 생성할 수 있다. 예컨대, 그룹 주제 생성부(260)는 제1 그룹에 포함된 기사들의 기사 주제들에 기초하여 그룹 주제를 생성할 수 있다. 본 발명의 예시적 실시예에 따라, 그룹 주제 생성부(260)가 생성한 제1 그룹의 그룹 주제는 데이터 베이스들(22, 24)에 저장된 기사들이 제1 그룹에 포함되는지 여부를 판단, 제1 그룹의 대표 기사를 식별 또는 제1 그룹과 다른 제2 그룹과 유사한지 여부를 판단하는데 사용될 수 있다.The group topic generation unit 260 may generate a group topic of the first group generated by the redundant article processing unit 240. [ For example, the group topic generation unit 260 may generate a group topic based on the article topics of the articles included in the first group. According to the exemplary embodiment of the present invention, the group topic of the first group generated by the group topic generation unit 260 is used to determine whether the articles stored in the databases 22 and 24 are included in the first group, May be used to identify a group of representative articles or to determine whether they are similar to a second group different from the first group.

도 6a 및 6b는 본 발명의 예시적 실시예들에 따라 도 1의 그룹 생성부(200)가 기사들을 그룹화하는 동작들을 나타내는 도면이다. 도 5에서 설명한 바와 같이, 제1 유사도 연산부(220)는 복수개의 기사들 사이의 제1 유사도를 연산할 수 있고, 중복 기사 처리부(240)는 연산된 제1 유사도에 따라 기사들을 그룹화하여 제1 그룹을 생성할 수 있다. 도 6a 및 6b는 6개의 기사들(A1 내지 A6)에 대한 제1 유사도를 연산하는 방법들을 예시적으로 나타내며, 6개의 기사들(A1 내지 A6)로부터 제1 그룹(G1) 및 제2 그룹(G2)을 생성하는 것을 나타낸다.6A and 6B are diagrams illustrating operations for grouping articles by the group generating unit 200 of FIG. 1 according to exemplary embodiments of the present invention. 5, the first similarity degree calculation unit 220 can calculate a first similarity degree among a plurality of articles, and the redundant article processing unit 240 groups articles according to the calculated first degree of similarity, Groups can be created. Figures 6a and 6b illustrate exemplary methods for computing a first degree of similarity for six articles A1 to A6 and illustrate how to calculate the first and second groups G1 and G2 from six articles A1 to A6, G2).

도 6a는 본 발명의 예시적 실시예에 따라 제1 유사도 연산부(220) 및 중복기사 처리부(240)가 기사들을 그룹화하는 동작을 나타내는 도면이다. 도 2를 참조하면, 단어 추출부(120)가 생성한 데이터, 예컨대 단어의 빈도 및 제목에 포함여부에 대한 정보는 그룹 생성부(200)의 제1 유사도 연산부(220)가 기사들 사이의 제1 유사도를 연산하는데 사용될 수도 있다. 예컨대, 제1 유사도 연산부(220)는 제1 기사(A1) 및 제2 기사(A2)의 제1 유사도를 판단하기 위하여, 단어 추출부(120)가 추출한 주요 단어 가운데 제1 기사(A1) 및 제2 기사(A2)에 공통적으로 포함되는 주요 단어의 빈도를 연산할 수 있다. 즉, 도 4에 도시된 실시예에서, 제1 유사도 연산부(220)는 아래와 같은 수학식을 통해서 제1 기사(A1) 및 제2 기사(A2)의 제1 유사도를 연산할 수 있다.FIG. 6A is a diagram illustrating an operation in which the first similarity computing unit 220 and the redundant article processing unit 240 group articles according to an exemplary embodiment of the present invention. 2, the information about whether the data generated by the word extracting unit 120, for example, the frequency and the title of the word, is included in the first similarity calculating unit 220 of the group generating unit 200, 1 < / RTI > For example, the first degree-of-similarity calculation unit 220 may calculate the first degree of similarity between the first article A1 and the second article A2 among the main words extracted by the word extraction unit 120, The frequency of the main word commonly included in the second article (A2) can be calculated. That is, in the embodiment shown in FIG. 4, the first similarity degree calculating section 220 can calculate the first similarity degree of the first article A1 and the second article A2 through the following equation.

[수학식 1][Equation 1]

(5×6) + (3×3) + (3×4) = 51(5 x 6) + (3 x 3) + (3 x 4) = 51

상기 수학식 1에서, 첫 번째 항은 제1 기사(A1) 및 제2 기사(A2)에 각각 포함된 “홍길동”의 개수들의 곱이고, 두 번째 및 세 번째 항은 각각 “우승” 및 “PGA”의 개수들의 곱이다. 한편, 제1 유사도 연산부(220)는 제목에 포함된 주요 단어의 경우 가중치를 더하거나 곱할 수 있다. 이에 따라 계산된 값이 높을 수록 기사들 사이의 유사도가 높은 것으로 판단될 수 있다.In the above equation (1), the first term is the product of the numbers of "Hong Kil Dong" included in the first article (A1) and the second article (A2), and the second and third terms are respectively " &Quot; On the other hand, the first similarity degree calculation unit 220 may add or multiply weights for the main words included in the title. Accordingly, it can be determined that the higher the calculated value, the higher the similarity between the articles.

도 6a에서 6개의 기사들(A1 내지 A6)은 6개의 노드들로 표현된다. 6개의 노드들은 각각 5개의 에지들을 통해서 자신과 다른 5개의 노드들에 연결될 수 있다. 예컨대, 제1 기사(A1)를 나타내는 노드는 제2 기사(A2) 내지 제6 기사(A6)를 나타내는 노드들과 5개의 에지들을 통해서 연결될 수 있다. 도 6a에 도시된 에지들은 각각 고유한 값을 가지고 있으며, 에지들이 갖는 값은 2개 기사들 사이의 제1 유사도를 나타낼 수 있다. 예컨대, 도 2를 함께 참조하면, 제1 기사(A1)를 나타내는 노드와 제2 기사(A2)를 나타내는 노드를 연결하는 에지는 상기 수학식 1에 따라 51의 값을 가질 수 있다.In Fig. 6A, six articles A1 to A6 are represented by six nodes. The six nodes can be connected to themselves and the other five nodes through five edges, respectively. For example, the node representing the first article A1 can be connected through five edges to the nodes representing the second article A2 to the sixth article A6. The edges shown in FIG. 6A each have a unique value, and the value of the edges may represent the first degree of similarity between the two articles. 2, the edge connecting the node representing the first article A1 and the node representing the second article A2 may have a value of 51 according to Equation (1) above.

제1 유사도 연산부(220)는 도 6a에 도시된 에지들에 대응하는 제1 유사도들을 각각 연산할 수 있다. 중복 기사 처리부(240)는 에지들이 갖는 값에 기초하여 기사들을 그룹화할 수 있다. 예컨대, 중복 기사 처리부(240)는 미리 정해진 문턱값 보다 높은 값(또는 제1 유사도)을 갖는 기사들을 그룹화할 수 있다. 이에 따라, 도 6a에 도시된 바와 같이, 중복 기사 처리부(240)는 제1 기사(A1), 제2 기사(A2) 및 제6 기사(A6)를 제1 그룹으로 그룹화할 수 있고, 제2 기사(A2) 및 제3 기사(A3)를 그룹화할 수 있다. 도 6a에 도시된 바와 같이, 제1 그룹(G1) 및 제2 그룹(G2)은 상호 배타적이지 않을 수 있다. 또한, 중복 기사 처리부(240)는 하나의 그룹에 포함된 기사들 가운데 중복되는 기사들을 제거함으로써 최종적으로 제 1그룹(G1) 및 제2 그룹(G2)를 생성할 수 있다.The first similarity degree calculating section 220 can calculate the first similarity degrees corresponding to the edges shown in FIG. 6A, respectively. The redundant article processing unit 240 may group the articles based on the values held by the edges. For example, the redundant article processing unit 240 may group articles having a value (or a first similarity degree) higher than a predetermined threshold value. 6A, the redundant article processing unit 240 can group the first article A1, the second article A2, and the sixth article A6 into the first group, and the second article A2, The article (A2) and the third article (A3) can be grouped. As shown in FIG. 6A, the first group G1 and the second group G2 may not be mutually exclusive. In addition, the redundant article processing unit 240 may generate the first group G1 and the second group G2 by eliminating overlapping articles among the articles included in one group.

도 6b는 본 발명의 예시적 실시예에 따라 제1 유사도 연산부(220) 및 중복기사 처리부(240)가 기사들을 그룹화하는 동작을 나타내는 도면이다. 제1 유사도 연산부(220)는 주제 생성부(140)가 생성한 기사 주제에 따라 기사를 좌표에 나타낼 수 있다. 예컨대, 도 6b의 2차원 좌표에서 X축은 제1 단어의 빈도를 나타내고, Y축은 제2 단어의 빈도를 나타낸다. 6개의 기사들(A1 내지 A6)은 각각 주제 생성부(140)가 생성한 기사 주제에 포함된 제1 단어 및 제2 단어가 각각의 기사에서 나타난 빈도(또는 각각의 기사에 포함된 제1 단어 및 제2 단어의 개수)에 따라 2차원 좌표에서 하나의 지점으로 나타날 수 있다. 다시 말해서, 기사는 원점에서부터 자신의 좌표에 이르는 벡터로서 정의 될 수 있으며, 이러한 벡터를 단어 벡터라고 부른다. 기사는 단어 벡터로서 인덱싱 될 수 있다. 예컨대, 도 6b에 도시된 바와 같이, 제3 기사(A3)는 단어 벡터(V3)로서 인덱싱될 수 있다.FIG. 6B is a diagram illustrating an operation in which the first similarity operation unit 220 and the redundant article processing unit 240 group articles according to an exemplary embodiment of the present invention. The first similarity degree calculation unit 220 may display an article in coordinates according to an article topic generated by the topic generation unit 140. [ For example, in the two-dimensional coordinate of FIG. 6B, the X-axis represents the frequency of the first word and the Y-axis represents the frequency of the second word. The six articles A1 to A6 are information indicating whether the first word and the second word included in the article subject generated by the subject generation unit 140 are the frequencies in each article (or the first words included in each article) And the number of the second words). In other words, an article can be defined as a vector from the origin to its coordinates, and these vectors are called word vectors. An article can be indexed as a word vector. For example, as shown in FIG. 6B, the third article A3 may be indexed as a word vector V3.

중복 기사 처리부(240)는 좌표에 표시된 기사들의 상대적인 거리에 기초하여 기사들을 그룹화할 수 있다. 즉, 중복 기사 처리부(240)는 미리 정해진 문턱값 보다 가까운 거리에 있는 기사들을 그룹화할 수 있다. 예컨대, 도 6b에 도시된 바와 같이, 제1 유사도 연산부(220)는 제1 기사 내지 제6 기사(A1 내지 A6)를 좌표에 표시할 수 있다. 제1 유사도 연산부(220)는 제1 기사(A1) 및 제6 기사(A6)를 제1 그룹(G1)으로 그룹화할 수 있고, 제2 기사(A2), 제4 기사(A4) 및 제5 기사(A5)를 제2 그룹(G2)으로 그룹화할 수 있다. 그 다음에, 중복 기사 처리부(240)는 제1 그룹(G1) 및 제2 그룹(G2) 내에서 중복된 기사들을 제거함으로써 최종적으로 제1 그룹(G1) 및 제2 그룹(G2)를 생성할 수 있다. 비록 도 6b는 축을 포함하는 2차원의 좌표를 도시하였지만, 본 발명의 예시적 실시예는 이에 제한되지 않으며 중복 기사 처리부(240)는 3차원 이상의 좌표에 표시된 복수개의 기사들에 대하여 상대적인 거리를 미리 정해진 문턱값과 비교하여 기사들을 그룹화할 수 있다.The redundant article processing unit 240 may group the articles based on the relative distance of the articles displayed in the coordinates. That is, the redundant-article processing unit 240 may group articles that are closer than a predetermined threshold. For example, as shown in FIG. 6B, the first similarity degree computing section 220 may display the first to sixth articles A1 to A6 in coordinates. The first similarity arithmetic operation unit 220 can group the first article A1 and the sixth article A6 into the first group G1 and the second article A2 and the fourth article A4, Articles A5 may be grouped into a second group G2. Next, the redundant article processing unit 240 generates the first group G1 and the second group G2 by eliminating the overlapping articles in the first group G1 and the second group G2 . Although the exemplary embodiment of the present invention is not limited thereto, the overlapping article processing unit 240 may divide the relative distance of a plurality of articles displayed in three-dimensional coordinates or more in advance The articles can be grouped by comparing them with a predetermined threshold value.

도 7은 본 발명의 예시적 실시예에 따라 도 1의 대표 기사 식별부(400)의 구현예를 나타내는 도면이다. 도 7에 도시된 바와 같이, 대표 기사 식별부(400)는 제2 유사도 연산부(420) 및 기사 랭킹부(440)를 포함할 수 있다. 제2 유사도 연산부(420)는 그룹 생성부(200)의 그룹 주제 생성부(260)가 생성한 제1 그룹의 그룹 주제와 제1 그룹에 포함된 기사들의 기사 주제들 사이의 제2 유사도를 연산할 수 있다. 비록 도 1에서는 그룹 생성부(200)가 제1 유사도 연산부(220)를 포함하고 대표 기사 식별부(400)가 제2 유사도 연산부(420)를 포함하는 것으로 도시되었으나, 본 발명의 예시적 실시예는 이에 제한되지 않으며 하나의 유사도 연산부를 그룹 생성부(200) 및 대표 기사 식별부(400)가 공유할 수도 있다.FIG. 7 is a diagram illustrating an embodiment of the representative article identifier 400 of FIG. 1 according to an exemplary embodiment of the present invention. As shown in FIG. 7, the representative article identifying unit 400 may include a second similarity calculating unit 420 and an article ranking unit 440. The second similarity computing unit 420 computes a second similarity degree between the group topic of the first group generated by the group topic generating unit 260 of the group generating unit 200 and the article subjects of the articles included in the first group can do. Although the group generating unit 200 includes the first similarity calculating unit 220 and the representative article identifying unit 400 includes the second similarity calculating unit 420 in FIG. 1, the exemplary embodiment of the present invention But the group generating unit 200 and the representative article identifying unit 400 may share one similarity calculating unit.

기사 랭킹부(440)는 제2 유사도 연산부(420)가 연산한 제2 유사도에 따라, 제1 그룹에 포함된 기사들의 순위를 결정할 수 있다. 예컨대, 기사 랭킹부(440)는 제1 그룹의 그룹 주제와 가장 제2 유사도가 높은 기사의 순위를 가장 높게 지정할 수 있다. 본 발명의 예시적 실시예에 따라, 기사 랭킹부(440)는 제1 그룹에 포함된 기사들 가운데 미리 정해진 수의 기사들에 대해서 순위를 결정할 수 있고, 이러한 미리 정해진 수의 기사들을 대표 기사라고 부른다. 또한, 기사 랭킹부(440)는 제2 유사도 연산부(420)가 연산한 제2 유사도 뿐만 아니라, 그룹 제공부(300)로부터 제공되는 피드백 정보에 더 기초하여 제1 그룹에 포함된 기사들의 순위를 결정할 수 있다. 예컨대, 도 8을 함께 참조하면, 그룹 제공부(300)의 기사 조회수 측정부(360)는 사용자(30)가 기사를 열람하는 횟수(조회수)에 대한 정보를 대표 기사 식별부(400)로 전달할 수 있고, 대표 기사 식별부(400)의 기사 랭킹부(440)는 제2 유사도 연산부(420)가 연산한 제2 유사도 뿐만 아니라 상기 조회수에 기초하여 제1 그룹에 포함된 기사들의 순위를 결정할 수 있다.The article ranking unit 440 can determine the ranking of the articles included in the first group according to the second degree of similarity calculated by the second similarity degree calculating unit 420. [ For example, the article ranking unit 440 can designate the highest ranking of articles having the second highest similarity with the group topic of the first group. According to an exemplary embodiment of the present invention, the article ranking unit 440 can determine a ranking for a predetermined number of articles among the articles included in the first group, and assigns the predetermined number of articles to a representative article I call it. The article ranking unit 440 may rank the articles included in the first group based on the second similarity calculated by the second similarity degree calculation unit 420 as well as the feedback information provided from the grouping unit 300 You can decide. For example, referring to FIG. 8, the article view number measurement unit 360 of the group providing apparatus 300 transmits information about the number of times (number of views) the user 30 views articles to the representative article identification unit 400 And the article ranking unit 440 of the representative article identification unit 400 can determine the ranking of the articles included in the first group based on the number of views as well as the second similarity calculated by the second similarity degree calculation unit 420 have.

도 8은 본 발명의 예시적 실시예에 따라 도 1의 그룹 제공부(300)의 구현예를 나타내는 도면이다. 도 8에 도시된 바와 같이, 그룹 제공부(300)는 주제별 그룹 제공부(320), 대표 기사 제공부(340) 및 기사 조회수 측정부(360)를 포함할 수 있다. 주제별 그룹 제공부(320)는 대표 기사 식별부(400)의 기사 랭킹부(440)가 결정한 제1 그룹의 기사들에 대한 순위에 따라 미리 정해진 개수의 기사들을 네트워크(40)를 통해서 사용자(30)에게 제공할 수 있다. 예컨대, 주제별 그룹 제공부(320)는 제1 그룹의 기사들 중 최고 순위의 기사를 다른 기사들 보다 부각되도록 나타낼 수 있다. 또한, 그룹 생성부(200)가 생성한 제1 그룹과 다른 제2 그룹의 대표 기사들을 기사 랭킹부(440)가 결정한 순위에 따라 네트워크(40)를 통해서 사용자(30)에게 제공할 수 있다.FIG. 8 is a diagram showing an embodiment of the group providing unit 300 of FIG. 1 according to an exemplary embodiment of the present invention. As shown in FIG. 8, the group providing unit 300 may include a topic group providing unit 320, a representative article providing unit 340, and an article view count measuring unit 360. The topic grouping unit 320 transmits a predetermined number of articles to the user 30 through the network 40 according to the rank of the first group of articles determined by the article ranking unit 440 of the representative article identification unit 400 ). For example, the topic grouping unit 320 may indicate that the highest ranked article among the articles of the first group is highlighted than the other articles. The representative members of the second group different from the first group generated by the group generating unit 200 may be provided to the user 30 via the network 40 according to the ranking determined by the article ranking unit 440.

대표 기사 제공부(340)는 그룹 생성부(200)가 생성한 제1 그룹에 포함된 기사들을 네트워크(40)를 통해서 사용자(30)에게 제공할 수 있다. 예컨대, 대표 기사 제공부(340)는 제1 그룹에 포함된 기사들의 제목 및 본문의 일부를 나열할 수 있고, 사용자(30)가 하나의 기사를 선택한 경우 선택한 기사의 제목 및 본문을 사용자(30)에게 제공할 수 있다. 또한, 대표 기사 제공부(340)는 사용자(30)가 선택한 기사를 제공하는 동시에, 제1 그룹의 그룹 주제와 유사한 그룹을 사용자(30)에게 제공할 수도 있다. 예컨대, 대표 기사 제공부(340)는 사용자(30)가 선택한 기사의 제목 및 본문을 화면의 넓은 공간에 나타내는 동시에, 나머지 일부 공간에서 제1 그룹의 그룹 주제와 유사한 그룹 주제를 갖는 그룹들의 대표 기사들의 제목들을 사용자(30)에게 제공할 수 있다.The representative article providing unit 340 may provide the user 30 with the articles included in the first group generated by the group generating unit 200 through the network 40. [ For example, the representative article providing unit 340 may list the title and the body part of the articles included in the first group, and if the user 30 selects one article, ). The representative article providing unit 340 may also provide the user 30 with a group similar to the first group of group topics while providing the articles selected by the user 30. [ For example, the representative article providing unit 340 displays the title and the body of the article selected by the user 30 in a wide space of the screen, and in the remaining space, representative articles of groups having a group theme similar to the group topic of the first group To the user (30).

기사 조회수 측정부(360)는 대표 기사 제공부(340)가 제공하는 기사들에 대한 사용자(30)의 열람 횟수(조회수)를 측정할 수 있다. 많은 수의 사용자(30)가 열람한 기사는 상대적으로 사용자(30)의 관심도가 높은 내용을 포함할 수 있는 것으로 추정될 수 있다. 따라서, 도 7을 참조하면 기사 조회수 측정부(360)가 측정한 기사의 조회수는 대표 기사 식별부(400)의 기사 랭킹부(440)에 전달될 수 있고, 기사 랭킹부(440)는 상기 조회수에 더 기초하여 제1 그룹에 포함된 기사들의 순위를 결정할 수 있다. 즉, 기사 랭킹부(440)는 기사의 조회수를 기사들의 순위를 결정하는데 반영할 수 있다.The article hits measurement unit 360 may measure the number of times (number of hits) the user 30 views the articles provided by the representative article providing unit 340. It can be inferred that an article viewed by a large number of users 30 may include contents relatively high in interest of the user 30. [ 7, the number of views of articles measured by the article hit count measuring unit 360 may be transmitted to the article ranking unit 440 of the representative article identification unit 400, and the article ranking unit 440 may transmit the number of hits The ranking of the articles included in the first group may be determined. That is, the article ranking unit 440 may reflect the number of views of the article in determining the ranking of the articles.

도 9는 본 발명의 예시적 실시예에 따라 도 1의 그룹 제공부(300)가 사용자(30)에게 제공하는 그룹들을 나타내는 도면이다. 도 1 및 8을 함께 참조하면, 그룹 제공부(300)는 그룹 생성부(200)가 생성한 그룹 및 대표 기사 식별부(400)가 식별한 대표 기사를 네트워크(40)를 통해서 사용자(30)에게 제공할 수 있다. 예컨대, 도 9에 도시된 바와 같이, 그룹 제공부(300)의 주제별 그룹 제공부(320)는 복수개의 그룹들의 대표 기사들(1100, 1200, 1300, 1400)을 나열하여 사용자(30)에게 제공할 수 있다. 본 발명의 예시적 실시예에 따라, 주제별 그룹 제공부(320)는 대표 기사의 제목(1120, 1220, 1320, 1420) 및 본문의 일부(1140, 1240, 1340, 1440)를 표시함으로써 그룹들의 대표 기사들(1100, 1200, 1300, 1400)을 나열할 수 있다. 사용자(30)는 네트워크(40)를 통해서 복수개의 그룹들 각각의 대표 기사들을 한눈에 볼 수 있으며, 대표 기사의 제목(1120, 1220, 1320, 1420) 및 본문의 일부(1140, 1240, 1340, 1440)를 통해서 사용자(30) 자신이 관심 있는 주제와 관련된 기사들의 그룹을 용이하게 선택할 수 있다.FIG. 9 is a diagram illustrating the groups that the grouping unit 300 of FIG. 1 provides to the user 30 in accordance with an exemplary embodiment of the present invention. Referring to FIGS. 1 and 8 together, the group providing unit 300 transmits a representative article identified by the group created by the group generating unit 200 and the representative article identifying unit 400 to the user 30 through the network 40, . For example, as shown in FIG. 9, the topic grouping unit 320 of the group providing unit 300 lists the representative articles 1100, 1200, 1300, and 1400 of a plurality of groups and provides them to the user 30 can do. In accordance with an exemplary embodiment of the present invention, a thematic grouping unit 320 may include a representative of groups by displaying titles 1120, 1220, 1320, and 1420 of a representative article and portions 1140, 1240, 1340, You can list the articles 1100, 1200, 1300, 1400. The user 30 can view representative articles of each of a plurality of groups at a glance via the network 40. The user can also view the title 1120, 1220, 1320, 1420 of the representative article and portions 1140, 1240, 1340, 1440, the user 30 can easily select a group of articles related to the topic of interest.

한편, 도 9는 주제별 그룹 제공부(320)가 각 그룹의 대표 기사로서 하나의 대표 기사를 사용자에게 제공하는 실시예를 도시하였으나, 주제별 그룹 제공부(320)는 각 그룹에 대응하는 복수개의 대표 기사들을 사용자(30)에게 제공할 수 있다. 예컨대, 주제별 그룹 제공부(320)는 대표 기사 식별부(400)의 기사 랭킹부(440)가 결정한 순위에 따라 미리 정해진 개수의 기사들을 그룹의 대표 기사들로서 사용자에게 제공할 수 있다. 또한, 주제별 그룹 제공부(320)는 최고 순위의 기사를 다른 기사들 보다 부각되도록 나타낼 수 있다.Meanwhile, FIG. 9 shows an embodiment in which the topic grouping unit 320 provides one representative article to the user as a representative article of each group. However, the topic grouping unit 320 may include a plurality of representative groups And may provide articles to the user 30. For example, the topic grouping unit 320 may provide a predetermined number of articles to the user as group representative articles according to the ranking determined by the article ranking unit 440 of the representative article identification unit 400. In addition, the topic grouping unit 320 may indicate that the highest ranking article is highlighted than the other articles.

본 발명의 예시적 실시예에 따라, 그룹 제공부(300)의 주제별 그룹 제공부(320)는 사용자(30)에게 제공되는 그룹들의 순서(또는 그룹들의 대표 기사들의 순서)를 결정할 수 있다. 예컨대, 주제별 그룹 제공부(320)는 사용자(30)에 의해서 그룹(또는 대표 기사)이 선택되는 횟수를 측정할 수 있고, 그룹이 선택되는 횟수에 기초하여 사용자(30)에게 제공하는 복수개의 그룹들을 나열할 수 있다. 이에 따라, 주제별 그룹 제공부(320)는 사용자(30)에 의해서 빈번하게 선택되는 그룹을 최상위에 나타낼 수 있다. 뿐만 아니라, 주제별 그룹 제공부(320)는 사용자(30)가 미리 설정한 정보에 따라 그룹들의 순서를 결정할 수 있다. 즉, 사용자(30)는 미리 적어도 하나의 관심분야, 예컨대 정치, 경제, 스포츠 또는 연예 등을 설정할 수 있고, 주제별 그룹 제공부(320)는 사용자(30)의 관심분야에 대응하는 그룹을 최상위에 나타낼 수 있다.According to an exemplary embodiment of the present invention, the topic grouper 320 of the grouping unit 300 may determine the order of the groups (or the order of the representative articles of groups) provided to the user 30. For example, the topic grouping unit 320 can measure the number of times the group 30 (or the representative article) is selected by the user 30, and determine the number of groups Lt; / RTI > Accordingly, the topic grouping unit 320 can display the group frequently selected by the user 30 at the top. In addition, the topic grouping unit 320 can determine the order of the groups according to the information previously set by the user 30. [ In other words, the user 30 can set at least one interest field in advance, such as politics, economy, sports, entertainment, etc., and the grouping by topic grouping 320 can set the group corresponding to the interest field of the user 30 at the top .

도 10은 본 발명의 예시적 실시예에 따라 도 1의 그룹 제공부(300)가 사용자(30)에게 제공하는 기사들을 나타내는 도면이다. 도 8을 함께 참조하면, 그룹 제공부(300)의 주제별 그룹 제공부(320)가 도 9에 도시된 바와 같이 각 그룹의 대표 기사를 사용자(30)에게 제공하고, 사용자(30)가 하나의 그룹을 선택한 경우 그룹 제공부(300)의 대표 기사 제공부(340)는 선택된 그룹에 포함된 기사들을 사용자(30)에게 제공할 수 있다.FIG. 10 is a diagram showing articles provided by the group providing unit 300 of FIG. 1 to the user 30 according to an exemplary embodiment of the present invention. Referring to FIG. 8 together, a topic grouping unit 320 of the group providing unit 300 provides a representative article of each group to the user 30 as shown in FIG. 9, When the group is selected, the representative article providing unit 340 of the group providing unit 300 may provide the user 30 with the articles included in the selected group.

도 10에 도시된 바와 같이, 대표 기사 제공부(340)는 대표 기사 식별부(400)의 기사 랭킹부(440)에 따라 최고 순위의 대표 기사(2200)가 포함하는 제목(2220) 및 본문(2240)을 가장 넓은 공간에 나타낼 수 있다. 또한, 대표 기사 제공부(340)는 선택된 그룹에 포함된 다른 기사들의 제목들에 대한 목록(2400)을 나열할 수 있고, 사용자(30)가 목록(2400)의 기사들 중 하나를 선택하는 경우, 선택된 기사의 제목 및 본문이 대표 기사(2200)의 제목(2220) 및 본문(2240)을 대체하여 표시될 수 있다.As shown in FIG. 10, the representative article providing unit 340 searches for a title 2220 and a body 2220 included in the highest ranking representative article 2200 according to the article ranking unit 440 of the representative article identification unit 400, 2240) in the widest space. In addition, the representative article providing unit 340 may list a list 2400 of the titles of other articles included in the selected group, and when the user 30 selects one of the articles in the list 2400 , The title and the body of the selected article may be displayed in place of the title 2220 and the body 2240 of the representative article 2200.

본 발명의 예시적 실시예에 따라, 대표 기사 제공부(340)는 사용자(30)가 선택한 그룹과 유사한 그룹 주제를 갖는 적어도 하나의 그룹들을 사용자(30)에게 제공할 수 있다. 예컨대, 도 5에서 설명한 바와 같이, 그룹 생성부(200)의 그룹 주제 생성부(260)를 통해서 생성된 복수개의 그룹들의 그룹 주제들 사이의 유사도에 따라서, 대표 기사 제공부(340)는 사용자(30)가 선택한 그룹과 그룹 주제가 유사한 복수개의 그룹들의 대표 기사들에 대한 목록(2600)을 사용자(30)에게 제공할 수 있다.According to an exemplary embodiment of the present invention, the representative article providing unit 340 may provide the user 30 with at least one group having a group subject similar to the group selected by the user 30. For example, as described in FIG. 5, according to the degree of similarity between group topics of a plurality of groups generated through the group topic generation unit 260 of the group generation unit 200, 30 may provide the user 30 with a list 2600 of representative articles of a plurality of groups whose group motifs are similar to the selected group.

한편, 본 발명의 예시적 실시예에 따라, 그룹 제공부(300)의 기사 조회수 측정부(360)는 대표 기사 제공부(340)가 제공하는 기사에 대한 사용자(30)의 열람 횟수(조회수)를 측정할 수 있다. 기사 조회수 측정부(360)는 기사에 대한 사용자(30)의 열람 횟수를 대표 기사 식별부(400)의 기사 랭킹부(440)에 전달할 수 있다. 기사 랭킹부(440)는 그룹 주제와 기사의 기사 주제 사이의 제2 유사도 뿐만 아니라 기사의 열람 횟수에 기초하여 기사의 순위를 결정할 수 있다. 이에 따라, 기사 랭킹부(440)가 기사의 열람 횟수에 따라 기사의 순위를 변경한 경우, 그룹 제공부(300)의 대표 기사 제공부(340)가 나열하는 기사들의 순서도 변경될 수 있다.According to an exemplary embodiment of the present invention, the article view count measuring unit 360 of the group providing unit 300 may determine the number of times (number of views) of the user 30 for the article provided by the representative article providing unit 340, Can be measured. The article view counting unit 360 may transmit the number of times the user 30 views the article to the article ranking unit 440 of the representative article identification unit 400. [ The article ranking unit 440 can determine the ranking of the articles based on the number of times the article is viewed as well as the second similarity between the group topic and the article subject of the article. Accordingly, when the article ranking unit 440 changes the ranking of articles according to the number of times the article is viewed, the order of the articles listed by the representative article providing unit 340 of the group providing unit 300 can be changed.

상기한 실시예의 설명은 본 발명의 더욱 철저한 이해를 위하여 도면을 참조로 예를 든 것에 불과하므로, 본 발명을 한정하는 의미로 해석되어서는 안될 것이다. 또한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기본적 원리를 벗어나지 않는 범위 내에서 다양한 변화와 변경이 가능함은 명백하다 할 것이다.The foregoing description of the embodiments is merely illustrative of the present invention with reference to the drawings for a more thorough understanding of the present invention, and thus should not be construed as limiting the present invention. It will be apparent to those skilled in the art that various changes and modifications may be made without departing from the basic principles of the present invention.

Claims

The article grouping system,
A subject extracting unit for extracting article topics of a plurality of articles stored in a database;
A group generating unit for generating the first group by grouping the plurality of articles based on the article topics; And
And a group providing unit for providing the first group to a user via a network,
Wherein,
A first degree of similarity calculation unit for calculating a first degree of similarity between the article topics;
A duplicate-article processing unit for removing duplicate articles based on the first similarity, and generating the first group; And
And a group subject generating unit for generating the first group theme based on the first similarity,
Wherein the redundant article processing unit selectively includes the first article in the first group based on the degree of similarity between the group topic of the first group and the article subject of the first article stored in the database, system.

The method according to claim 1,
Each of the article topics comprising at least one word,
The subject extracting unit
A word extracting unit for extracting at least one main word from each of the plurality of articles; And
And a topic generating unit for generating an article topic of each of the plurality of articles based on the main word.

3. The method of claim 2,
Wherein the word extracting unit extracts the main word based on a frequency of words included in each of the plurality of articles.

delete

The method according to claim 1,
Wherein the redundant article processing unit divides the first group into two or more groups based on the first degree of similarity and the number of articles included in the first group.

The method according to claim 1,
And a representative article identification unit for identifying at least one representative article included in the first group based on the article topics of the articles included in the first group and the group topic of the first group.

8. The method of claim 7,
The representative article identifying unit
A second degree of similarity calculation unit for calculating a second degree of similarity between the article subjects of the articles included in the first group and the group subject of the first group; And
And an article ranking unit for determining a ranking of articles included in the first group based on the second degree of similarity.

8. The method of claim 7,
The group providing unit
Providing a plurality of groups including the first group to the user; And
And a representative article providing unit for providing the representative article of the first group to the user.

9. The method of claim 8,
Wherein the group providing unit further includes an article hit counting unit for measuring the number of hits of the user for articles included in the first group,
Wherein the article ranking unit determines the ranking based further on the hits.