JP7068404B2

JP7068404B2 - A method and system for providing a document timeline using clusters of issue units that are relevant over a long period of time.

Info

Publication number: JP7068404B2
Application number: JP2020138409A
Authority: JP
Inventors: ボンソクユ; スヒャンキム; ヘジンキム; サンヒイム; ミョンボンパク; ヘインキム; ジョンシクヤン; ギョンミイ; ビョンジュンキム; スンハクチェ; ユジョンソ
Original assignee: Naver Corp
Current assignee: Naver Corp
Priority date: 2019-08-21
Filing date: 2020-08-19
Publication date: 2022-05-16
Anticipated expiration: 2040-08-19
Also published as: JP2021034048A

Description

以下の説明は、文書タイムラインを提供する技術に関する。 The following description relates to techniques for providing a document timeline.

近年、情報の検索および推薦などの知能型情報サービスのために、文書を体系的に分類してグループ化するための多様な研究が行われている。 In recent years, various studies have been conducted to systematically classify and group documents for intelligent information services such as information retrieval and recommendation.

グループ化、すなわちクラスタリングとは、データマイニングの１つであって、多数の項目の属性を基準にその類似性を演算してグループ化する技法である。 Grouping, that is, clustering, is one of data mining, and is a technique of calculating the similarity based on the attributes of a large number of items and grouping them.

クラスタリングは、与えられたデータ集合を、互いに類似性をもつ複数のクラスタ（Ｃｌｕｓｔｅｒ）に分割していく過程であって、１つのグループに属するデータは、互いに異なるグループ内のデータとは区分される類似性をもつようになる。 Clustering is the process of dividing a given data set into multiple clusters (Crusters) that have similarities to each other, and data belonging to one group is separated from data in different groups. It will have similarities.

このような文書クラスタリング方法として、情報検索分野では、Ｋ－ｍｅａｎｓクラスタリング方法やオントロジー（ｏｎｔｏｌｏｇｙ）を利用したクラスタリング方法などを始めとした多様な方法が研究されている。 As such a document clustering method, various methods such as a K-means clustering method and a clustering method using an ontology are being studied in the field of information retrieval.

例えば、特許文献１（公告日２０１１年９月２７日）には、文書のキーワードを含むオントロジーを利用して文書をクラスタリングする技術が開示されている。 For example, Patent Document 1 (publication date: September 27, 2011) discloses a technique for clustering documents by using an ontology including a keyword of a document.

韓国登録特許第１０－１０６７８１９号公報Korean Registered Patent No. 10-1067819 Gazette

中／長期的に関連のあるイシュー単位のクラスタグループとしてイシュークラスタを生成し、イシュークラスタを利用した文書タイムラインを提供することができる方法およびシステムを提供する。 It provides a method and a system that can generate an issue cluster as a cluster group of issue units that are related in the medium / long term and provide a document timeline using the issue cluster.

コンピュータシステムが実行する文書タイムライン提供方法であって、前記コンピュータシステムは、メモリに含まれるコンピュータ読み取り可能な命令を実行するように構成された少なくとも１つのプロセッサを含み、前記文書タイムライン提供方法は、前記少なくとも１つのプロセッサにより、類似文書によって構成されたクラスタ間の類似度に基づいて前記クラスタを併合することにより、クラスタグループとしてイシュークラスタを生成する段階、および前記少なくとも１つのプロセッサにより、前記イシュークラスタを利用して前記イシュークラスタに含まれた文書に対するタイムラインを表示する段階を含む、文書タイムライン提供方法を提供する。 A method of providing a document timeline performed by a computer system, wherein the computer system includes at least one processor configured to execute a computer-readable instruction contained in the memory. The step of creating an issue cluster as a cluster group by merging the clusters based on the similarity between clusters composed of similar documents by the at least one processor, and the issue by the at least one processor. Provided is a document timeline providing method including a step of displaying a timeline for a document included in the issue cluster using the cluster.

一側面によると、前記表示する段階は、ユーザの選択や設定によって特定された文書あるいはコンテンツ提供者によって特定された文書が含まれたイシュークラスタを表示対象として決定する段階を含んでよい。 According to one aspect, the display step may include determining for display an issue cluster that includes a document identified by a user's selection or settings or a document identified by a content provider.

他の側面によると、前記表示する段階は、文書の件数、クラスタリング後の経過時間、コメント数のうちの少なくとも１つが事前に定められた条件に該当するイシュークラスタを表示対象として決定する段階を含んでよい。 According to another aspect, the display stage includes a stage in which an issue cluster in which at least one of the number of documents, the elapsed time after clustering, and the number of comments meets a predetermined condition is determined as a display target. It's fine.

また他の側面によると、前記表示する段階は、前記イシュークラスタに含まれた単位時間別の文書件数がグラフ形態で表示される時間領域と、前記イシュークラスタに含まれた文書リストが単位時間別に表示される文書領域とを含むタイムラインを表示してよい。 According to another aspect, in the display stage, the time domain in which the number of documents by unit time included in the issue cluster is displayed in a graph form and the document list included in the issue cluster are displayed by unit time. You may display a timeline that includes the displayed document area.

また他の側面によると、前記時間領域と前記文書領域とが有機的に連結された構造でナビゲーション機能を提供してよい。 Further, according to another aspect, the navigation function may be provided by a structure in which the time domain and the document domain are organically connected.

また他の側面によると、前記時間領域は一方向のスクロールが可能であって、前記文書領域は前記時間領域とは異なる方向のスクロールが可能なインタフェースを含んでよい。 Further, according to another aspect, the time domain may be scrollable in one direction, and the document area may include an interface capable of scrolling in a direction different from the time domain.

また他の側面によると、前記文書領域に対するスクロールにしたがい、前記文書リストの画面表示に合わせて前記時間領域が自動でスクロールされてよい。 Further, according to another aspect, the time domain may be automatically scrolled according to the screen display of the document list according to the scrolling to the document area.

また他の側面によると、前記文書領域に対してスクロールがなされるときに、前記時間領域のサイズあるいは前記時間領域のグラフが縮小された形態で前記時間領域が簡略化されて表示されてよい。 Further, according to another aspect, when the document area is scrolled, the time domain may be displayed in a simplified form in a reduced form of the size of the time domain or the graph of the time domain.

また他の側面によると、前記文書領域に対するスクロールの方向にしたがい、前記時間領域の表示状態または隠し状態が選択的に適用されてよい。 Further, according to another aspect, the display state or the hidden state of the time domain may be selectively applied according to the scrolling direction with respect to the document area.

さらに他の側面によると、前記時間領域で特定の単位時間のグラフバーが選択された場合、前記文書領域が前記選択された単位時間の文書リストにスクロールされて表示されてよい。 According to yet another aspect, if a particular unit time graph bar is selected in the time domain, the document area may be scrolled into and displayed in the selected unit time document list.

前記文書タイムライン提供方法を前記コンピュータシステムに実行させるために非一時的なコンピュータ読み取り可能な記録媒体に記録される、コンピュータプログラムを提供する。 Provided is a computer program recorded on a non-temporary computer-readable recording medium for the computer system to execute the document timeline providing method.

前記文書タイムライン提供方法をコンピュータに実行させるためのプログラムが記録されている、非一時的なコンピュータ読み取り可能な記録媒体を提供する。 Provided is a non-temporary computer-readable recording medium in which a program for causing a computer to execute the document timeline providing method is recorded.

コンピュータシステムであって、メモリに含まれるコンピュータ読み取り可能な命令を実行するように構成された少なくとも１つのプロセッサを含み、前記少なくとも１つのプロセッサは、類似文書によって構成されたクラスタ間の類似度に基づいて前記クラスタを併合することにより、クラスタグループとしてイシュークラスタを生成するクラスタ生成部、および前記イシュークラスタを利用して前記イシュークラスタに含まれた文書に対するタイムラインを表示するクラスタ表示部を含む、コンピュータシステムを提供する。 A computer system that includes at least one processor configured to execute computer-readable instructions contained in memory, said at least one processor based on the degree of similarity between clusters configured by similar documents. A computer including a cluster generator that creates an issue cluster as a cluster group by merging the clusters, and a cluster display unit that displays a timeline for documents included in the issue cluster using the issue cluster. Provide a system.

本発明の実施形態によると、時間帯別に生成された短期クラスタをクラスタ間の類似度に基づいて併合することにより、中／長期的に関連のあるイシュー単位のクラスタグループとしてイシュークラスタを生成することができ、イシュークラスタを利用して長期間の連関性が高い記事を時系列で示す文書タイムラインを提供することができる。 According to the embodiment of the present invention, by merging short-term clusters generated by time period based on the similarity between clusters, an issue cluster is generated as a cluster group of issue units related in the medium to long term. It is possible to use issue clusters to provide a document timeline that shows articles with high long-term relevance in chronological order.

本発明の一実施形態における、ネットワーク環境の例を示した図である。It is a figure which showed the example of the network environment in one Embodiment of this invention. 本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。It is a block diagram for demonstrating the internal structure of the electronic device and the server in one Embodiment of this invention. 本発明の一実施形態における、サーバのプロセッサが含むことのできる構成要素の例を示した図である。It is a figure which showed the example of the component which the processor of a server can include in one Embodiment of this invention. 本発明の一実施形態における、サーバが実行することのできる方法の例を示したフローチャートである。It is a flowchart which showed the example of the method which a server can execute in one Embodiment of this invention. 本発明の一実施形態における、クラスタ間の類似度に基づいてクラスタを併合する過程を示した例示図である。It is an exemplary diagram which showed the process of merging clusters based on the similarity between clusters in one Embodiment of this invention. 本発明の一実施形態における、クラスタ間の類似度に基づいてクラスタを併合する過程を示した例示図である。It is an exemplary diagram which showed the process of merging clusters based on the similarity between clusters in one Embodiment of this invention. 本発明の一実施形態における、クラスタ間の類似度に基づいてクラスタを併合する過程を示した例示図である。It is an exemplary diagram which showed the process of merging clusters based on the similarity between clusters in one Embodiment of this invention. 本発明の一実施形態における、クラスタ間の類似度に基づいてクラスタを併合する過程を示した例示図である。It is an exemplary diagram which showed the process of merging clusters based on the similarity between clusters in one Embodiment of this invention. 本発明の一実施形態における、クラスタ併合過程の一例を示したフローチャートである。It is a flowchart which showed an example of the cluster merge process in one Embodiment of this invention. 本発明の一実施形態における、イシュークラスタを利用したイシュータイムラインのインタフェース画面を示した例示図である。It is an exemplary diagram which showed the interface screen of the issue timeline using an issue cluster in one Embodiment of this invention. 本発明の一実施形態における、イシュークラスタを利用したイシュータイムラインのインタフェース画面を示した例示図である。It is an exemplary diagram which showed the interface screen of the issue timeline using an issue cluster in one Embodiment of this invention. 本発明の一実施形態における、イシュークラスタを利用したイシュータイムラインのインタフェース画面を示した例示図である。It is an exemplary diagram which showed the interface screen of the issue timeline using an issue cluster in one Embodiment of this invention.

以下、本発明の実施形態について、添付の図面を参照しながら詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

本発明の実施形態は、文書クラスタリング技術に関する。 Embodiments of the present invention relate to document clustering techniques.

本明細書で具体的に開示される事項を含む実施形態は、時間帯別に生成された短期クラスタをクラスタ間の類似度に基づいて併合することにより、中／長期的に関連のあるイシュー単位のクラスタグループとしてイシュークラスタを生成することができ、イシュークラスタを利用して長期間の連関性が高い記事を時系列で示す文書タイムラインを提供することができる。 An embodiment including the matters specifically disclosed herein is a medium / long-term relevant issue unit by merging short-term clusters generated by time zone based on the similarity between clusters. Issue clusters can be created as cluster groups, and issue clusters can be used to provide a document timeline that shows articles with high long-term relevance in chronological order.

本明細書で具体的に開示される事項を含む実施形態は、イシュークラスタによって単位時間に対する表示領域と各単位時間別にクラスタリングされた文書を表示することにより、長期間の連関性が高い文書を一目で把握することができ、領域が有機的に連結された構造で文書確認のためのナビゲーション機能を提供することにより、ユーザの利便性を向上させることができる。 In the embodiment including the matters specifically disclosed in the present specification, a document having high long-term relevance is viewed at a glance by displaying a display area for a unit time and a document clustered for each unit time by an issue cluster. By providing a navigation function for document confirmation in a structure in which areas are organically connected, it is possible to improve user convenience.

図１は、本発明の一実施形態における、ネットワーク環境の例を示した図である。図１のネットワーク環境は、複数の電子機器１１０、１２０、１３０、１４０、複数のサーバ１５０、１６０、およびネットワーク１７０を含む例を示している。このような図１は、発明の説明のための一例に過ぎず、電子機器の数やサーバの数が図１のように限定されることはない。 FIG. 1 is a diagram showing an example of a network environment according to an embodiment of the present invention. The network environment of FIG. 1 shows an example including a plurality of electronic devices 110, 120, 130, 140, a plurality of servers 150, 160, and a network 170. Such FIG. 1 is merely an example for explaining the invention, and the number of electronic devices and the number of servers are not limited as in FIG.

複数の電子機器１１０、１２０、１３０、１４０は、コンピュータシステムによって実現される固定端末や移動端末であってよい。複数の電子機器１１０、１２０、１３０、１４０の例としては、ＡＩスピーカ、スマートフォン、携帯電話、ナビゲーション、ＰＣ（ｐｅｒｓｏｎａｌｃｏｍｐｕｔｅｒ）、ノート型ＰＣ、デジタル放送用端末、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、ＰＭＰ（ＰｏｒｔａｂｌｅＭｕｌｔｉｍｅｄｉａＰｌａｙｅｒ）、タブレット、ゲームコンソール、ウェアラブルデバイス、ＩｏＴ（ｉｎｔｅｒｎｅｔｏｆｔｈｉｎｇｓ）デバイス、ＶＲ（ｖｉｒｔｕａｌｒｅａｌｉｔｙ）デバイス、ＡＲ（ａｕｇｍｅｎｔｅｄｒｅａｌｉｔｙ）デバイスなどがある。一例として、図１では、電子機器１１０の例としてスマートフォンを示しているが、本発明の実施形態において、電子機器１１０は、実質的に無線または有線通信方式を利用し、ネットワーク１７０を介して他の電子機器１２０、１３０、１４０および／またはサーバ１５０、１６０と通信することのできる多様な物理的なコンピュータシステムのうちの１つを意味してよい。 The plurality of electronic devices 110, 120, 130, 140 may be fixed terminals or mobile terminals realized by a computer system. Examples of a plurality of electronic devices 110, 120, 130, 140 include AI speakers, smartphones, mobile phones, navigation systems, PCs (personal computers), notebook PCs, digital broadcasting terminals, PDAs (Personal Digital Assistants), and PMPs (PMPs). There are Portable Multimedia Players, tablets, game consoles, wearable devices, IoT (internet of things) devices, VR (visual reality) devices, AR (augmented reality) devices, and the like. As an example, FIG. 1 shows a smartphone as an example of the electronic device 110, but in the embodiment of the present invention, the electronic device 110 substantially utilizes a wireless or wired communication method, and another via the network 170. It may mean one of a variety of physical computer systems capable of communicating with the electronic devices 120, 130, 140 and / or the servers 150, 160.

通信方式が限定されることはなく、ネットワーク１７０が含むことのできる通信網（一例として、移動通信網、有線インターネット、無線インターネット、放送網、衛星網など）を利用する通信方式だけではなく、機器間の近距離無線通信が含まれてよい。例えば、ネットワーク１７０は、ＰＡＮ（ｐｅｒｓｏｎａｌａｒｅａｎｅｔｗｏｒｋ）、ＬＡＮ（ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）、ＣＡＮ（ｃａｍｐｕｓａｒｅａｎｅｔｗｏｒｋ）、ＭＡＮ（ｍｅｔｒｏｐｏｌｉｔａｎａｒｅａｎｅｔｗｏｒｋ）、ＷＡＮ（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ）、ＢＢＮ（ｂｒｏａｄｂａｎｄｎｅｔｗｏｒｋ）、インターネットなどのネットワークのうちの１つ以上の任意のネットワークを含んでよい。さらに、ネットワーク１７０は、バスネットワーク、スターネットワーク、リングネットワーク、メッシュネットワーク、スター－バスネットワーク、ツリーまたは階層的ネットワークなどを含むネットワークトポロジのうちの任意の１つ以上を含んでもよいが、これらに限定されることはない。 The communication method is not limited, and not only a communication method using a communication network that can be included in the network 170 (for example, a mobile communication network, a wired Internet, a wireless Internet, a broadcasting network, a satellite network, etc.), but also a device. Short-range wireless communication between may be included. For example, the network 170 includes a PAN (personal area network), a LAN (local area network), a CAN (campus area network), a MAN (metropolitan area network), a WAN (wise Internet) network, etc. It may include any one or more of the networks. Further, network 170 may include, but is limited to, any one or more of network topologies, including bus networks, star networks, ring networks, mesh networks, star-bus networks, tree or hierarchical networks, and the like. Will not be done.

サーバ１５０、１６０それぞれは、複数の電子機器１１０、１２０、１３０、１４０とネットワーク１７０を介して通信して命令、コード、ファイル、コンテンツ、サービスなどを提供する１つ以上のコンピュータ装置によって実現されてよい。例えば、サーバ１５０は、ネットワーク１７０を介して接続した複数の電子機器１１０、１２０、１３０、１４０に第１サービスを提供するシステムであってよく、サーバ１６０も、ネットワーク１７０を介して接続した複数の電子機器１１０、１２０、１３０、１４０に第２サービスを提供するシステムであってよい。より具体的な例として、サーバ１５０は、複数の電子機器１１０、１２０、１３０、１４０においてインストールされて実行されるコンピュータプログラムであるアプリケーションを通じ、該当のアプリケーションが目的とするサービス（一例として、ニュースサービスなど）を第１サービスとして複数の電子機器１１０、１２０、１３０、１４０に提供してよい。他の例として、サーバ１６０は、上述したアプリケーションのインストールおよび実行のためのファイルを複数の電子機器１１０、１２０、１３０、１４０に配布するサービスを第２サービスとして提供してよい。 Each of the servers 150, 160 is realized by one or more computer devices that communicate with a plurality of electronic devices 110, 120, 130, 140 via a network 170 to provide instructions, codes, files, contents, services, and the like. good. For example, the server 150 may be a system that provides the first service to a plurality of electronic devices 110, 120, 130, 140 connected via the network 170, and the server 160 may also be a plurality of systems connected via the network 170. It may be a system that provides a second service to electronic devices 110, 120, 130, 140. As a more specific example, the server 150 is a service (as an example, a news service) aimed at by the application through an application which is a computer program installed and executed in a plurality of electronic devices 110, 120, 130, 140. Etc.) may be provided to a plurality of electronic devices 110, 120, 130, 140 as a first service. As another example, the server 160 may provide a service for distributing the above-mentioned application installation and execution files to a plurality of electronic devices 110, 120, 130, 140 as a second service.

図２は、本発明の一実施形態における、電子機器およびサーバの内部構成を説明するためのブロック図である。図２では、電子機器に対する例として電子機器１１０の内部構成およびサーバ１５０の内部構成について説明する。また、他の電子機器１２０、１３０、１４０やサーバ１６０も、上述した電子機器１１０またはサーバ１５０と同一または類似の内部構成を有してよい。 FIG. 2 is a block diagram for explaining an internal configuration of an electronic device and a server according to an embodiment of the present invention. FIG. 2 describes the internal configuration of the electronic device 110 and the internal configuration of the server 150 as examples for the electronic device. Further, the other electronic devices 120, 130, 140 and the server 160 may have the same or similar internal configuration as the above-mentioned electronic device 110 or the server 150.

電子機器１１０およびサーバ１５０は、メモリ２１１、２２１、プロセッサ２１２、２２２、通信モジュール２１３、２２３、および入力／出力インタフェース２１４、２２４を含んでよい。メモリ２１１、２２１は、非一時的なコンピュータ読み取り可能な記録媒体であって、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）、ＲＯＭ（ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）、ディスクドライブ、ＳＳＤ（ｓｏｌｉｄｓｔａｔｅｄｒｉｖｅ）、フラッシュメモリ（ｆｌａｓｈｍｅｍｏｒｙ）などのような永続的大容量記録装置を含んでよい。ここで、ＲＯＭ、ＳＳＤ、フラッシュメモリ、ディスクドライブのような永続的大容量記録装置は、メモリ２１１、２２１とは区分される別の永続的記録装置として電子機器１１０やサーバ１５０に含まれてもよい。また、メモリ２１１、２２１には、オペレーティングシステムと、少なくとも１つのプログラムコード（一例として、電子機器１１０においてインストールされて実行されるブラウザや、特定のサービスの提供のために電子機器１１０にインストールされるアプリケーションなどのためのコード）が記録されてよい。このようなソフトウェア構成要素は、メモリ２１１、２２１とは別のコンピュータ読み取り可能な記録媒体からロードされてよい。このような別のコンピュータ読み取り可能な記録媒体は、フロッピー（登録商標）ドライブ、ディスク、テープ、ＤＶＤ／ＣＤ－ＲＯＭドライブ、メモリカードなどのコンピュータ読み取り可能な記録媒体を含んでよい。他の実施形態において、ソフトウェア構成要素は、コンピュータ読み取り可能な記録媒体ではない通信モジュール２１３、２２３を通じてメモリ２１１、２２１にロードされてもよい。例えば、少なくとも１つのプログラムは、開発者またはアプリケーションのインストールファイルを配布するファイル配布システム（一例として、上述したサーバ１６０）がネットワーク１７０を介して提供するファイルによってインストールされるコンピュータプログラム（一例として、上述したアプリケーション）に基づいてメモリ２１１、２２１にロードされてよい。 The electronic device 110 and the server 150 may include memories 211, 221s, processors 212, 222, communication modules 213, 223, and input / output interfaces 214, 224. The memory 211 and 221 are non-temporary computer-readable recording media, and are a RAM (random access memory), a ROM (read only memory), a disk drive, an SSD (sold state drive), and a flash memory (flash memory). Permanent mass recording devices such as, etc. may be included. Here, even if a permanent large-capacity recording device such as a ROM, SSD, flash memory, or disk drive is included in the electronic device 110 or the server 150 as another permanent recording device that is separated from the memories 211 and 221. good. Further, the memory 211 and 221 are installed with an operating system and at least one program code (for example, a browser installed and executed in the electronic device 110, or installed in the electronic device 110 to provide a specific service. Code for applications etc.) may be recorded. Such software components may be loaded from a computer-readable recording medium separate from the memories 211 and 221. Such other computer-readable recording media may include computer-readable recording media such as floppy (registered trademark) drives, discs, tapes, DVD / CD-ROM drives, and memory cards. In other embodiments, software components may be loaded into memory 211, 221 through communication modules 213, 223, which are not computer readable recording media. For example, at least one program is a computer program installed by a file provided by a file distribution system (eg, server 160 described above) that distributes a developer or application installation file via a network 170 (eg, described above). It may be loaded into the memory 211 or 221 based on the application.

プロセッサ２１２、２２２は、基本的な算術、ロジック、および入出力演算を実行することにより、コンピュータプログラムの命令を処理するように構成されてよい。命令は、メモリ２１１、２２１または通信モジュール２１３、２２３によって、プロセッサ２１２、２２２に提供されてよい。例えば、プロセッサ２１２、２２２は、メモリ２１１、２２１のような記録装置に記録されたプログラムコードにしたがって受信される命令を実行するように構成されてよい。 Processors 212 and 222 may be configured to process instructions in a computer program by performing basic arithmetic, logic, and input / output operations. Instructions may be provided to processor 212, 222 by memory 211, 221 or communication modules 213, 223. For example, the processors 212 and 222 may be configured to execute instructions received according to a program code recorded in a recording device such as memories 211 and 221.

通信モジュール２１３、２２３は、ネットワーク１７０を介して電子機器１１０とサーバ１５０とが互いに通信するための機能を提供してもよいし、電子機器１１０および／またはサーバ１５０が他の電子機器（一例として、電子機器１２０）または他のサーバ（一例として、サーバ１６０）と通信するための機能を提供してもよい。一例として、電子機器１１０のプロセッサ２１２がメモリ２１１のような記録装置に記録されたプログラムコードにしたがって生成した要求が、通信モジュール２１３の制御にしたがってネットワーク１７０を介してサーバ１５０に伝達されてよい。これとは逆に、サーバ１５０のプロセッサ２２２の制御にしたがって提供される制御信号や命令、コンテンツ、ファイルなどが、通信モジュール２２３とネットワーク１７０を経て電子機器１１０の通信モジュール２１３を通じて電子機器１１０に受信されてよい。例えば、通信モジュール２１３を通じて受信されたサーバ１５０の制御信号や命令、コンテンツ、ファイルなどは、プロセッサ２１２やメモリ２１１に伝達されてよく、コンテンツやファイルなどは、電子機器１１０がさらに含むことのできる記録媒体（上述した永続的記録装置）に記録されてよい。 The communication modules 213 and 223 may provide a function for the electronic device 110 and the server 150 to communicate with each other via the network 170, and the electronic device 110 and / or the server 150 may provide another electronic device (as an example). , Electronic device 120) or another server (eg, server 160). As an example, a request generated by a processor 212 of an electronic device 110 according to a program code recorded in a recording device such as a memory 211 may be transmitted to a server 150 via a network 170 under the control of a communication module 213. On the contrary, control signals, instructions, contents, files, etc. provided under the control of the processor 222 of the server 150 are received by the electronic device 110 through the communication module 213 of the electronic device 110 via the communication module 223 and the network 170. May be done. For example, control signals, instructions, contents, files, etc. of the server 150 received through the communication module 213 may be transmitted to the processor 212 and the memory 211, and the contents, files, etc. may be further recorded by the electronic device 110. It may be recorded on a medium (permanent recording device described above).

入力／出力インタフェース２１４は、入力／出力装置２１５とのインタフェースのための手段であってよい。例えば、入力装置は、キーボード、マウス、マイクロフォン、カメラなどの装置を、出力装置は、ディスプレイ、スピーカ、触覚フィードバックデバイスなどのような装置を含んでよい。他の例として、入力／出力インタフェース２１４は、タッチスクリーンのように入力と出力のための機能が１つに統合された装置とのインタフェースのための手段であってもよい。入力／出力装置２１５は、電子機器１１０と１つの装置で構成されてもよい。また、サーバ１５０の入力／出力インタフェース２２４は、サーバ１５０に接続するかサーバ１５０が含むことのできる入力または出力のための装置（図示せず）とのインタフェースのための手段であってよい。より具体的な例として、電子機器１１０のプロセッサ２１２がメモリ２１１にロードされたコンピュータプログラムの命令を処理するにあたり、サーバ１５０や電子機器１２０が提供するデータを利用して構成されるサービス画面やコンテンツが、入力／出力インタフェース２１４を通じてディスプレイに表示されてよい。 The input / output interface 214 may be a means for an interface with the input / output device 215. For example, the input device may include a device such as a keyboard, mouse, microphone, camera, and the output device may include a device such as a display, speaker, haptic feedback device, and the like. As another example, the input / output interface 214 may be a means for an interface with a device that integrates functions for input and output, such as a touch screen. The input / output device 215 may be composed of an electronic device 110 and one device. Also, the input / output interface 224 of the server 150 may be a means for connecting to the server 150 or for interfacing with a device (not shown) for input or output that the server 150 can include. As a more specific example, when the processor 212 of the electronic device 110 processes an instruction of a computer program loaded in the memory 211, a service screen or content configured by using data provided by the server 150 or the electronic device 120. May be displayed on the display through the input / output interface 214.

また、他の実施形態において、電子機器１１０およびサーバ１５０は、図２の構成要素よりも多くの構成要素を含んでもよい。しかし、大部分の従来技術的構成要素を明確に図に示す必要はない。例えば、電子機器１１０は、上述した入力／出力装置２１５のうちの少なくとも一部を含むように実現されてもよいし、トランシーバ、カメラ、各種センサ、データベースなどのような他の構成要素をさらに含んでもよい。より具体的な例として、電子機器１１０がＡＩスピーカである場合、一般的にＡＩスピーカが含んでいる各種センサ、カメラモジュール、物理的な各種ボタン、タッチパネルを利用したボタン、入力／出力ポート、振動のための振動器などのような多様な構成要素が、電子機器１１０にさらに含まれるように実現されてよい。 Also, in other embodiments, the electronic device 110 and the server 150 may include more components than the components of FIG. However, most prior art components need not be clearly shown in the figure. For example, the electronic device 110 may be implemented to include at least a portion of the input / output devices 215 described above, and may further include other components such as transceivers, cameras, various sensors, databases, and the like. But it may be. As a more specific example, when the electronic device 110 is an AI speaker, various sensors generally included in the AI speaker, a camera module, various physical buttons, buttons using a touch panel, input / output ports, and vibrations. Various components such as a loudspeaker for the device 110 may be further incorporated into the electronic device 110.

以下では、長期間の連関性が高い文書クラスタリングのための方法およびシステムの具体的な実施形態について説明する。 In the following, specific embodiments of methods and systems for document clustering that are highly relevant over a long period of time will be described.

本明細書において、文書とは、インターネット上で検索や推薦などの対象となる情報単位を意味してよい。 In the present specification, the document may mean an information unit to be searched or recommended on the Internet.

本実施形態では、ニュースサービスを通じて提供される記事（ｎｅｗｓ）を文書の代表的な例として説明するが、クラスタリングの対象が記事に限定されてはならず、インターネット上に情報単位で提供されるすべての形態の文書に拡大した適用が可能である。 In this embodiment, articles provided through a news service will be described as a typical example of a document, but the target of clustering should not be limited to articles, and all provided on the Internet in units of information. It can be expanded and applied to documents in the form of.

文書クラスタリング技術は、多くの記事を主題別に効果的に分類して伝達するために利用されており、クラスタリングされた記事の集結規模や重要度などに基づき、所定のアルゴリズムを用いて特定のクラスタ記事をサービス画面の上位に表示する方式を使用したりもする。 Document clustering technology is used to effectively classify and convey many articles by subject, and based on the size and importance of clustered articles, a specific clustered article is used using a predetermined algorithm. Is also used to display at the top of the service screen.

このようなクラスタリング技術は、内容が似ている記事をクラスタリングするものであり、主に最新の記事を中心として使用する場合には品質は高まるが、時間範囲が広がるほどクラスタリングの品質は低下する。また、時間によって主題が変わる時事的な記事をクラスタリングするには限界があり、記事の件数が増えるほど処理時間が急激に増加するという問題がある。 Such a clustering technique clusters articles with similar contents, and the quality is improved when mainly the latest articles are used, but the quality of clustering decreases as the time range is widened. In addition, there is a limit to clustering current articles whose subject changes with time, and there is a problem that the processing time increases sharply as the number of articles increases.

本発明の実施形態は、時間帯別に生成された短期クラスタを併合することにより、中／長期的に関連のあるイシュー単位のクラスタグループとしてイシュークラスタを生成することを目的とする。 An object of the present invention is to generate an issue cluster as a cluster group of issue units related in the medium to long term by merging short-term clusters generated by time zone.

図３は、本発明の一実施形態における、サーバのプロセッサが含むことのできる構成要素の例を示したブロック図であり、図４は、本発明の一実施形態における、サーバが実行することのできる方法の例を示したフローチャートである。 FIG. 3 is a block diagram showing an example of components that the processor of the server can include in one embodiment of the present invention, and FIG. 4 is a block diagram of what the server executes in one embodiment of the present invention. It is a flowchart which showed the example of the possible method.

本実施形態に係るサーバ１５０は、記事を提供するニュースサービスプラットフォームの役割を担ってよい。特に、サーバ１５０は、中／長期的に続くイシューに該当する記事を効果的にクラスタリングして提供してよい。 The server 150 according to the present embodiment may play the role of a news service platform for providing articles. In particular, the server 150 may effectively cluster and provide articles corresponding to medium / long-term issues.

サーバ１５０のプロセッサ２２２は、図４に係る文書クラスタリング方法を実行するための構成要素として、図３に示すように、クラスタ収集部３１０、クラスタ生成部３２０、およびクラスタ表示部３３０を含んでよい。実施形態によって、プロセッサ２２２の構成要素は、選択的にプロセッサ２２２に含まれても除外されてもよい。また、実施形態によって、プロセッサ２２２の構成要素は、プロセッサ２２２の機能の表現のために分離されても併合されてもよい。 As shown in FIG. 3, the processor 222 of the server 150 may include a cluster collection unit 310, a cluster generation unit 320, and a cluster display unit 330 as components for executing the document clustering method according to FIG. Depending on the embodiment, the components of processor 222 may be selectively included or excluded from processor 222. Also, depending on the embodiment, the components of processor 222 may be separated or merged to represent the functionality of processor 222.

このようなプロセッサ２２２およびプロセッサ２２２の構成要素は、図４の文書クラスタリング方法が含む段階４１０～４５０を実行するようにサーバ１５０を制御してよい。例えば、プロセッサ２２２およびプロセッサ２２２の構成要素は、メモリ２２１が含むオペレーティングシステムのコードと少なくとも１つのプログラムのコードとによる命令（ｉｎｓｔｒｕｃｔｉｏｎ）を実行するように実現されてよい。 Such a processor 222 and components of the processor 222 may control the server 150 to perform steps 410-450 included in the document clustering method of FIG. For example, the processor 222 and the components of the processor 222 may be implemented to execute an instruction by the code of the operating system included in the memory 221 and the code of at least one program.

ここで、プロセッサ２２２の構成要素は、サーバ１５０に記録されたプログラムコードが提供する命令にしたがってプロセッサ２２２によって実行される互いに異なる機能（ｄｉｆｆｅｒｅｎｔｆｕｎｃｔｉｏｎｓ）の表現であってよい。例えば、サーバ１５０が短期クラスタを収集するように上述した命令にしたがってサーバ１５０を制御するプロセッサ２２２の機能的表現として、クラスタ収集部３１０が利用されてよい。 Here, the components of the processor 222 may be representations of different functions executed by the processor 222 according to the instructions provided by the program code recorded in the server 150. For example, the cluster collection unit 310 may be used as a functional representation of the processor 222 that controls the server 150 according to the instructions described above so that the server 150 collects short-term clusters.

プロセッサ２２２は、サーバ１５０の制御と関連する命令がロードされたメモリ２２１から必要な命令を読み取ってよい。この場合、前記読み取られた命令は、以下で説明する段階４１０～４５０をプロセッサ２２２が実行するように制御するための命令を含んでよい。以下で説明する段階４１０～４５０は、図４に示した順序とは異なる順序で実行されてもよく、段階４１０～４５０の一部が省略されても、追加の過程がさらに含まれてもよい。 The processor 222 may read the necessary instructions from the memory 221 in which the instructions related to the control of the server 150 are loaded. In this case, the read instruction may include an instruction for controlling the processor 222 to execute steps 410 to 450 described below. The steps 410-450 described below may be performed in a different order than that shown in FIG. 4, and some of the steps 410-450 may be omitted or may include additional steps. ..

以下では、時間を基準として連関性の高い記事で束ねられたクラスタは「短期クラスタ」とし、連関性の高いクラスタとして束ねられて１つのイシュー単位が生成される場合、イシュー単位に該当するクラスタグループは「イシュークラスタ」とする。 In the following, clusters bundled by articles that are highly related based on time are referred to as "short-term clusters", and when one issue unit is generated by being bundled as a highly related cluster, the cluster group corresponding to the issue unit. Is an "issue cluster".

図４を参照すると、段階４１０で、クラスタ収集部３１０は、記事間の類似度に基づいて類似記事としてクラスタリングされた短期クラスタを収集してよい。クラスタ収集部３１０は、ここ最近の一定時間内の記事をクラスタリングして生成された短期クラスタを収集してよい。短期クラスタを生成するための記事クラスタリング方法としては、広く利用されているクラスタリング技法のうちの少なくとも１つが利用されてよい。このとき、各短期クラスタに対しては、クラスタ生成過程で固有の識別子（ＩＤ）が付与されてよい。記事クラスタリングの作業中に、以前に生成された既存のクラスタでは以前に付与された識別子が維持されてよい。 Referring to FIG. 4, at step 410, the cluster collection unit 310 may collect short-term clusters clustered as similar articles based on the degree of similarity between articles. The cluster collection unit 310 may collect short-term clusters generated by clustering articles within a certain period of time in recent years. As the article clustering method for generating short-term clusters, at least one of the widely used clustering techniques may be used. At this time, a unique identifier (ID) may be assigned to each short-term cluster in the cluster generation process. During the work of article clustering, previously generated existing clusters may retain the previously assigned identifier.

例えば、図５を参照すると、クラスタ収集部３１０は、互いに異なる記事５０のベクトル間の類似度が一定レベル以上の場合に、１つのクラスタとして束ねて短期クラスタ５０１を生成してよい。このとき、各記事５０のベクトルとは、該当の記事５０に含まれた単語のベクトル平均を意味してよい。クラスタ収集部３１０は、ここ最近３６時間以内の類似記事を短期クラスタ５０１として束ねてよい。短期クラスタ５０１に対して以前に付与された識別子（ＩＤ）が維持されていたとしても、３６時間が経過した記事は該当のクラスタから削除し、ここ最近３６時間以内の記事だけがサービス対象として表示されてよい。本発明の実施形態では、短期クラスタ５０１を収集するにあたり、短期クラスタ５０１から時間の経過によって削除された記事も、既存の識別子（ＩＤ）ですべて累積して収集してよい。 For example, referring to FIG. 5, the cluster collecting unit 310 may generate a short-term cluster 501 by bundling them as one cluster when the similarity between the vectors of articles 50 different from each other is a certain level or more. At this time, the vector of each article 50 may mean the vector average of the words included in the corresponding article 50. The cluster collection unit 310 may bundle similar articles within the last 36 hours as a short-term cluster 501. Even if the identifier (ID) previously assigned to the short-term cluster 501 is maintained, articles that have passed 36 hours will be deleted from the cluster, and only articles within the last 36 hours will be displayed as service targets. May be done. In the embodiment of the present invention, when collecting the short-term cluster 501, all the articles deleted from the short-term cluster 501 over time may be accumulated and collected by the existing identifier (ID).

図４を参照すると、段階４２０で、クラスタ生成部３２０は、クラスタ間の類似度に基づいて短期クラスタを併合することにより、イシュークラスタを生成してよい。短期クラスタのうちの一部は類似する主題の記事で構成されているが、１つのクラスタとして束ねられていなかったり、記事の時間差のせいで互いに異なるクラスタとして存在したりする場合が多い。このような問題を解決するために、クラスタ生成部３２０は、短期クラスタがクラスタ併合の基準を満たす場合、つまり、短期クラスタ間の類似度が一定レベル以上の場合、１つのイシュークラスタとして束ねてよい。 Referring to FIG. 4, at step 420, the cluster generator 320 may generate an issue cluster by merging short-term clusters based on the similarity between the clusters. Some of the short-term clusters are composed of articles with similar subjects, but they are often not bundled as one cluster or exist as different clusters due to the time difference of the articles. In order to solve such a problem, the cluster generation unit 320 may bundle the short-term clusters as one issue cluster when the short-term clusters meet the criteria for cluster merging, that is, when the similarity between the short-term clusters is a certain level or higher. ..

例えば、図６を参照すると、クラスタ生成部３２０は、短期クラスタ５０１のうち、短期クラスタ５０１のベクトル間の類似度が一定レベル以上の場合、１つのクラスタとして併合してイシュークラスタ６０２を生成してよい。短期クラスタ５０１のベクトルとは、該当のクラスタに含まれた記事のベクトル平均を意味してよい。 For example, referring to FIG. 6, when the similarity between the vectors of the short-term clusters 501 is equal to or higher than a certain level among the short-term clusters 501, the cluster generation unit 320 merges them as one cluster to generate the issue cluster 602. good. The vector of the short-term cluster 501 may mean the vector average of the articles contained in the cluster.

クラスタ併合の基準としてクラスタ間の類似度を適用するが、一例として、クラスタベクトルどうしを比較し、ユークリッド（Ｅｕｃｌｉｄ）値とコサイン（ｃｏｓｉｎｅ）値のうちの少なくとも１つが定められた範囲にあるクラスタを１つのクラスタとして束ねてよい。例えば、２つのクラスタのベクトル間のコサイン類似度が０．９８以上の場合、および／または２つのクラスタのベクトル間のユークリッド類似度が０．７１以上の場合、２つのクラスタを１つのクラスタとして束ねてよい。このとき、ベクトル間の類似度は、「１／（１＋２つのベクトル間の距離）」のように定義されてよい。 Similarity between clusters is applied as a criterion for cluster merging, but as an example, cluster vectors are compared and clusters in which at least one of the Euclidean and cosine values is within a defined range. It may be bundled as one cluster. For example, if the cosine similarity between the vectors of two clusters is 0.98 or higher, and / or if the Euclidean similarity between the vectors of the two clusters is 0.71 or higher, the two clusters are bundled as one cluster. It's okay. At this time, the similarity between the vectors may be defined as "1 / (distance between 1 + 2 vectors)".

クラスタのベクトル値を求めるために、従来は、クラスタに記事が追加される場合、全体記事のベクトルを用いて全体の平均を新たに求める方法を適用していたが、このような場合には多数のＡＰIの呼び出しと多数の計算を含むようになり、記事が増加するほどその計算量も増加していた。 In order to obtain the vector value of the cluster, in the past, when articles were added to the cluster, the method of newly calculating the overall average using the vector of the entire article was applied, but in such cases, there are many cases. It now includes API calls and a large number of calculations, and as the number of articles increases, so does the amount of calculations.

本実施形態では、短期クラスタｃに記事ｎが追加される場合、短期クラスタｃに対して以前に計算されたベクトルｖを利用して新規ベクトルｖ’を一度に求めることができるが、これは数式（１）のように定義されてよい。 In the present embodiment, when the article n is added to the short-term cluster c, the new vector v'can be obtained at once by using the vector v previously calculated for the short-term cluster c, which is a mathematical formula. It may be defined as (1).

ｖ’＝（ｖ_ｎ＋（ｖ×ｍ_ｃ））／（１＋ｍ_ｃ）・・・（１） v' ₌ (vn + (v × _mc )) / (1 + _mc ) ・・・ (1)

ここで、ｖ_ｎは追加される記事ｎのベクトル値、ｍ_ｃは短期クラスタｃに含まれた既存の記事件数を意味する。 Here, v _n means the vector value of the article n to be added, and mc means the number of existing articles included in the short-term cluster _c .

このような方式により、以前に計算された短期クラスタのベクトル値を利用することで、記事が追加された短期クラスタの新規ベクトル値を容易かつ迅速に求めることができる。 By such a method, the vector value of the short-term cluster to which the article is added can be easily and quickly obtained by using the vector value of the short-term cluster calculated previously.

図４を参照すると、段階４３０で、クラスタ生成部３２０は、クラスタ間の類似度に基づいて新たに生成された短期クラスタを、段階４２０で生成されたイシュークラスタと併合してよい。 Referring to FIG. 4, at stage 430, the cluster generator 320 may merge the newly generated short-term clusters based on the similarity between the clusters with the issue clusters generated at stage 420.

図７に示すように、クラスタ生成部３２０は、段階４２０で生成されたイシュークラスタ６０２のベクトルと新たに生成される短期クラスタ５０１’のベクトルとを比較し、クラスタ併合の基準を満たすベクトル類似度をもつ場合、短期クラスタ５０１’をイシュークラスタ６０２に含ませてよい。このとき、イシュークラスタのベクトルとは、該当のクラスタに含まれた短期クラスタのベクトル平均を意味してよく、短期クラスタとイシュークラスタの併合基準は、上述した短期クラスタ間の併合基準と等しい。 As shown in FIG. 7, the cluster generation unit 320 compares the vector of the issue cluster 602 generated in the stage 420 with the vector of the newly generated short-term cluster 501', and the vector similarity satisfying the criteria of cluster merger. The short-term cluster 501'may be included in the issue cluster 602. At this time, the vector of the issue cluster may mean the vector average of the short-term clusters included in the corresponding cluster, and the merge criterion of the short-term cluster and the issue cluster is equal to the merge criterion between the short-term clusters described above.

図４を参照すると、段階４４０で、クラスタ生成部３２０は、クラスタ間の類似度に基づいて以前に生成されたイシュークラスタを、段階４３０で生成されたイシュークラスタと併合してよい。イシュークラスタとイシュークラスタ間の併合基準は、上述した短期クラスタ間の併合基準と等しい。 Referring to FIG. 4, at stage 440, the cluster generator 320 may merge the previously generated issue clusters with the issue clusters generated at stage 430 based on the similarity between the clusters. The merge criteria between issue clusters and issue clusters are the same as the merge criteria between short-term clusters described above.

図８に示すように、クラスタ生成部３２０は、短期クラスタの併合によって生成されたイシュークラスタのベクトルと以前に生成されたイシュークラスタ６０２’のベクトルとを比較し、クラスタ併合の基準を満たすベクトル類似度をもつ場合、１つのイシュークラスタ６０２として併合してよい。言い換えれば、クラスタ生成部３２０は、イシュークラスタ間のベクトルを互いに比較し、クラスタ併合の基準に適えば１つのイシュークラスタ６０２として束ねてよい。このとき、クラスタ生成部３２０は、クラスタ併合の基準を満たすイシュークラスタの記事件数を比較し、記事件数が多いイシュークラスタに残りのイシュークラスタの記事を併合してよい。 As shown in FIG. 8, the cluster generator 320 compares the vector of the issue cluster generated by the merger of short-term clusters with the vector of the previously generated issue cluster 602', and vector similarity satisfying the criteria of cluster merger. If it has a degree, it may be merged as one issue cluster 602. In other words, the cluster generation unit 320 may compare the vectors between the issue clusters with each other and bundle them as one issue cluster 602 if the criteria for cluster merging are met. At this time, the cluster generation unit 320 may compare the number of articles of the issue cluster satisfying the criteria of cluster merger, and merge the articles of the remaining issue clusters into the issue cluster having a large number of articles.

本実施形態では、イシュークラスタＣに新たな短期クラスタｃが追加されるか、（記事件数がより少ない）小さいイシュークラスタＣ’を（記事件数がより多い）大きいイシュークラスタＣに併合する場合、イシュークラスタＣに対して以前に計算されたベクトルＶを利用して新規ベクトルＶ’を一度に求めることができるが、これは数式（２）または数式（３）のように定義されてよい。 In this embodiment, when a new short-term cluster c is added to the issue cluster C, or a small issue cluster C'(with a smaller number of articles) is merged with a large issue cluster C (with a larger number of articles), an issue is issued. A new vector V'can be obtained at once using the previously calculated vector V for the cluster C, which may be defined as math (2) or math (3).

Ｖ’＝（（ｖ_ｃ×ｍ_ｃ）＋（Ｖ×ｍ_Ｃ））／（ｍ_ｃ＋ｍ_Ｃ）・・・（２） V'= ((v _c x m _c ) + (V x m _C )) / (m _c + m _C ) ... (2)

ここで、ｖ_ｃは短期クラスタｃのベクトル値、ｍ_ｃは短期クラスタｃに含まれた記事件数、ｍ_ＣはイシュークラスタＣに含まれた既存の記事件数を意味する。 Here, v _c means the vector value of the short-term cluster c, mc means the number of articles included in the short-term cluster _c , and m _C means the number of existing articles included in the issue cluster C.

Ｖ’＝（（ｖ_Ｃ’×ｍ_Ｃ’）＋（Ｖ×ｍ_Ｃ））／（ｍ_Ｃ’＋ｍ_Ｃ）・・・（３） V'= ((v C'x m _C _' ) + (V x m _C )) / (m _C' + m _C ) ... (3)

ここで、ｖ_Ｃ’は小さいイシュークラスタＣ’のベクトル値、ｍ_Ｃ’は小さいイシュークラスタＣ’に含まれた記事件数、ｍ_ＣはイシュークラスタＣに含まれた既存の記事件数を意味する。 Here, v C'means the vector value of the small issue cluster _C _' , m C'means the number of articles contained in the small issue cluster C', and m _C means the number of existing articles contained in the issue cluster C.

このような方式により、以前に計算されたイシュークラスタのベクトル値を利用することで、短期クラスタあるいは小さいイシュークラスタが併合されたイシュークラスタの新規ベクトル値を容易かつ迅速に求めることができる。 By using such a previously calculated vector value of an issue cluster, it is possible to easily and quickly obtain a new vector value of an issue cluster in which a short-term cluster or a small issue cluster is merged.

クラスタに対して全体記事のベクトル平均を毎回のように求める場合に比べ、以前の過程で予め計算されたベクトル値を利用することにより、記事が追加されるか他のクラスタと併合されるクラスタの新規ベクトルを飛躍的に迅速に求めることができる。 Compared to the case where the vector average of the entire article is calculated for the cluster every time, by using the vector value calculated in advance in the previous process, the article is added or merged with other clusters. New vectors can be obtained dramatically and quickly.

上述したイシュークラスタの生成過程４１０～４４０は例示的なものに過ぎず、これに限定されることはない。 The issue cluster generation processes 410 to 440 described above are merely exemplary and are not limited thereto.

また、Ａ、Ｂ、Ｃ、Ｄの短期クラスタあるいはイシュークラスタが存在すると仮定するとき、クラスタ間の併合過程４２０～４４０で、Ａ－Ｂ、Ａ－Ｃ、Ｂ－Ｃ、Ｃ－Ｄがクラスタ併合の条件を満たす場合に併合を一度に行うようになれば、Ａ、Ｂ、Ｃ、Ｄがすべて１つのクラスタとして併合される。 Also, assuming that there are short-term clusters or issue clusters of A, B, C, D, AB, AC, BC, CD are cluster merged in the merge process 420-440 between the clusters. If the merging is performed at once when the above conditions are satisfied, A, B, C, and D are all merged as one cluster.

しかし、クラスタ併合の条件を満たすクラスタを一度に併合する場合、次のような問題が発生することがある。 However, when merging clusters that satisfy the conditions for cluster merging at once, the following problems may occur.

先ず、クラスタＡとＢの併合結果、併合されたクラスタＡＢのベクトルとクラスタＤ間のベクトル距離が遠くなってクラスタ併合の条件を満たすことができなくなることがあり、クラスタ併合の条件にまったく適わなかったクラスタＥと併合クラスタＡＢとのベクトル距離が近くなってクラスタ併合の条件を満たすようになることがある。 First, as a result of merging clusters A and B, the vector distance between the vector of the merged cluster AB and the vector D may become too long to satisfy the condition of cluster merging, which is completely unsuitable for the condition of cluster merging. The vector distance between the cluster E and the merged cluster AB may become close to each other, and the condition of cluster merge may be satisfied.

次に、クラスタ併合プロセスを並列進行する場合、クラスタＡとＢがクラスタＢに併合されると同時に、クラスタＢとＣがクラスタＢに併合されることがある。 Next, when the cluster merge process is carried out in parallel, clusters A and B may be merged into cluster B, and at the same time, clusters B and C may be merged into cluster B.

このような２つの問題を解決するためのクラスタ併合方法の一例は、図９に示すとおりである。 An example of a cluster merging method for solving these two problems is shown in FIG.

図９を参照すると、クラスタ生成部３２０は、すべてのクラスタを対象に、各クラスタ別に該当のクラスタ以後に生成されたクラスタと比較し、そのうちベクトル距離が最も近いクラスタ（短期クラスタあるいはイシュークラスタ）だけを併合対象として選定する（Ｓ９０１）。この段階９０１は、並列処理方式で行われる。 Referring to FIG. 9, the cluster generation unit 320 compares all clusters with the clusters generated after the corresponding cluster for each cluster, and only the cluster with the shortest vector distance (short-term cluster or issue cluster). Is selected as the merge target (S901). This step 901 is performed by a parallel processing method.

クラスタ生成部３２０は、段階９０１で併合対象として選定されたクラスタを、直列処理方式によって１つ１つ順に併合する（Ｓ９０２）。既に併合されたクラスタは、次の過程で併合対象として再選定されたとしても無視し、併合後には併合対象として選定されないように除外してよい。 The cluster generation unit 320 merges the clusters selected as merge targets in step 901 one by one by a serial processing method (S902). Clusters that have already been merged may be ignored even if they are reselected as targets for merging in the next process, and may be excluded so that they will not be selected as targets for merging after merging.

クラスタ生成部３２０は、段階９０２の実行結果、併合されるクラスタの件数が０であるかを判断し（Ｓ９０３）、併合されるクラスタが１つ以上存在する場合には、併合されたクラスタを含んだ全体クラスタを対象として再び段階９０１からの段階を繰り返す。 As a result of executing step 902, the cluster generation unit 320 determines whether the number of clusters to be merged is 0 (S903), and if there is one or more clusters to be merged, the cluster generation unit 320 includes the merged clusters. However, the steps from step 901 are repeated again for the entire cluster.

クラスタ生成部３２０は、併合されるクラスタの個数が０になるまで前記過程９０１～９０２を繰り返した後、併合されるクラスタの件数が０になれば、クラスタ併合を終了させる（Ｓ９０４）。 The cluster generation unit 320 repeats the steps 901 to 902 until the number of clusters to be merged becomes 0, and then ends the cluster merger when the number of clusters to be merged becomes 0 (S904).

本実施形態では、クラスタ併合の以前と以後の結果に影響を及ぼさずに計算量が多い部分を並列処理することにより、迅速かつ安定的にクラスタ併合を実行することができる。 In the present embodiment, cluster merging can be executed quickly and stably by parallel processing a portion having a large amount of calculation without affecting the results before and after the cluster merging.

クラスタ併合方法は、同じ記事の存在可否、ベクトルユークリッド、ベクトルコサインなどのように類似度を示す多様な特徴を活用してよく、広く利用されている併合方式のうちの少なくとも１つの方式を利用してよく、各方式による併合結果で記事件数、クラスタ数、クラスタ内の記事の類似度あるいは関連性などのような要件に応じて選択的に適用されてよい。 The cluster merging method may utilize various features indicating similarity such as existence / nonexistence of the same article, vector Euclidean, vector cosine, etc., and uses at least one of the widely used merging methods. It may be selectively applied according to the requirements such as the number of articles, the number of clusters, the similarity or relevance of articles in the cluster, etc. in the merged result by each method.

本実施形態では、記事を提供するサービスを通じて揮発性であるクラスタデータと破片化されたクラスタデータを併合することにより、中／長期的に関連のあるイシュー単位のクラスタグループとしてイシュークラスタを生成することができる。 In this embodiment, an issue cluster is generated as a medium / long-term related issue-based cluster group by merging volatile cluster data and fragmented cluster data through a service that provides articles. Can be done.

従来には、長期にわたって収集した記事をクラスタリングする場合、テキストを中心とし、テキストが似ている記事を束ね、結局は似たような文書の集合体を生成するだけであったが、本発明では時間を基準として束ねたクラスタをクラスタ間の類似度に基づいて併合していくことにより、イシュー単位のクラスタグループを生成することができ、文書上の内容が変わるが脈絡的には共通するイシューをもつ文書の集合体を生成することができる。クラスタ間の類似度に基づいてクラスタを併合してイシュークラスタを生成することにより、記事間の類似度では併合が困難であった記事を、共通するイシューをもつ１つのクラスタとして束ねることができる。 In the past, when clustering articles collected over a long period of time, it was only necessary to bundle articles with similar texts around the text and eventually generate a collection of similar documents, but in the present invention, By merging clusters bundled based on time based on the similarity between clusters, it is possible to generate cluster groups for each issue, and although the contents of the document change, issues that are common in terms of context can be created. It is possible to generate a collection of documents with. By merging clusters based on the similarity between clusters to generate an issue cluster, articles that were difficult to merge due to the similarity between articles can be bundled as one cluster having a common issue.

図４を参照すると、段階４５０で、クラスタ表示部３３０は、段階４４０で最終併合されたイシュークラスタの記事を、記事提供サービスを通じて表示してよい。クラスタ表示部３３０は、イシュークラスタを利用した記事タイムラインとしてインターネット上の記事のうちでイシュークラスタに含まれた記事に対するタイムラインを提供してよい。言い換えれば、クラスタ表示部３３０は、イシュークラスタを利用してイシュークラスタに含まれた記事を時系列で表示してよい。 Referring to FIG. 4, at step 450, the cluster display unit 330 may display the article of the issue cluster finally merged at step 440 through the article providing service. The cluster display unit 330 may provide a timeline for articles included in the issue cluster among articles on the Internet as an article timeline using the issue cluster. In other words, the cluster display unit 330 may display the articles included in the issue cluster in chronological order by using the issue cluster.

クラスタ表示部３３０は、イシュークラスタのうちで事前に定められた表示条件を満たす少なくとも１つのイシュークラスタを、サービスを通じて表示してよい。一例として、クラスタ表示部３３０は、コンテンツ提供者（ｃｏｎｔｅｎｔｐｒｏｖｉｄｅｒ）（例えば、報道機関など）によって特定された記事あるいはサービスを利用するユーザの選択や設定によって特定された記事が存在するイシュークラスタをサービス表示対象として決定してよい。言い換えれば、クラスタ表示部３３０は、報道機関が取り扱う主要イシューやユーザ個人が関心のあるイシューに対応するイシュークラスタに対し、サービスページ上にイシュータイムラインを表示してよい。他の例として、クラスタ表示部３３０は、文書の件数、クラスタリング後の経過時間、コメント数のうちの少なくとも１つが事前に定められた条件に該当するイシュークラスタをサービス表示対象として決定してよい。イシュークラスタの表示条件は、例えば、クラスタに含まれた記事のうち最初にクラスタリングされた記事が２日以上経過した場合、あるいはクラスタに含まれた記事の総数が２００件以上の場合、コメント数が１００件以上の記事が１日あたり３件以上であり該当の条件の日が２日以上である場合などが含まれてよい。上述したイシュークラスタの表示条件とともに、報道機関イシューや個人イシューを結合させてサービス表示対象を決定することも可能である。 The cluster display unit 330 may display at least one issue cluster that satisfies a predetermined display condition among the issue clusters through the service. As an example, the cluster display unit 330 provides an issue cluster in which an article specified by a content provider (for example, a news organization) or an article specified by the selection or setting of a user who uses the service exists. It may be determined as a display target. In other words, the cluster display unit 330 may display an issue timeline on the service page for the issue cluster corresponding to the main issue handled by the news media or the issue of interest to the individual user. As another example, the cluster display unit 330 may determine an issue cluster in which at least one of the number of documents, the elapsed time after clustering, and the number of comments meets a predetermined condition as a service display target. The display condition of the issue cluster is, for example, when the first clustered article among the articles contained in the cluster has passed for 2 days or more, or when the total number of articles contained in the cluster is 200 or more, the number of comments is It may include the case where 100 or more articles are 3 or more per day and the date of the corresponding condition is 2 days or more. In addition to the above-mentioned issue cluster display conditions, it is also possible to combine news media issues and individual issues to determine the service display target.

上述イシュークラスタを利用したイシュータイムラインのインタフェース画面の例は、図１０～１２に示すとおりである。 An example of the interface screen of the issue timeline using the above-mentioned issue cluster is as shown in FIGS. 10 to 12.

図１０を参照すると、クラスタ表示部３３０は、イシュークラスタ、すなわち、中／長期的に連関性の高い記事を時系列で表示するためのイシュータイムライン画面１０００を、ニュースサービスを通じて表示してよい。イシュータイムライン画面１０００には、イシュークラスタに含まれた単位時間（例えば、１ｄａｙ）別に記事件数をグラフ形態で示す時間領域１０１０、およびイシュークラスタに含まれた記事リストを単位時間別に区分して示す記事領域１０２０が含まれてよい。 Referring to FIG. 10, the cluster display unit 330 may display an issue cluster, that is, an issue timeline screen 1000 for displaying articles having high relevance in the medium / long term in chronological order through a news service. On the issue timeline screen 1000, the time domain 1010 showing the number of articles in graph form for each unit time (for example, 1 day) included in the issue cluster, and the article list included in the issue cluster are shown separately for each unit time. Article area 1020 may be included.

イシュータイムライン画面１０００は、中／長期的に連関性の高い記事を一目で把握することができ、時間領域１０１０と記事領域１０２０とが有機的に連結された構造によって記事確認のためのナビゲーション機能を提供してよい。 The issue timeline screen 1000 can grasp articles that are highly related in the medium / long term at a glance, and has a navigation function for article confirmation due to the structure in which the time domain 1010 and the article domain 1020 are organically connected. May be provided.

イシュータイムライン画面１０００の一部領域、例えば上端には、イシュークラスタが生成された期間情報１００１、イシュークラスタと関連するイシュータイトル１００２などが含まれてよい。イシュータイトル１００２は、イシュークラスタに含まれた少なくとも１つの記事の題目やタグなどを活用して生成されるものであって、例えば、コメント数が最も多い記事の題目がイシュータイトル１００２として表示されてもよいし、あるいは記事の題目やタグなどに主に登場するキーワードの組み合わせによってイシュータイトル１００２が生成されて表示されてもよい。 A part of the issue timeline screen 1000, for example, the upper end may include the period information 1001 in which the issue cluster is generated, the issue title 1002 associated with the issue cluster, and the like. The issue title 1002 is generated by utilizing the title or tag of at least one article included in the issue cluster. For example, the title of the article having the largest number of comments is displayed as the issue title 1002. Alternatively, the issue title 1002 may be generated and displayed by a combination of keywords that mainly appear in the title or tag of the article.

時間領域１０１０は、イシュークラスタに含まれた各単位時間の時間情報と記事件数が含まれるものであって、例えば、一軸には時間情報を示し、他の軸には記事件数を棒の長さで示す棒グラフの形態が表示されてよい。このとき、記事件数により、グラフバーのディスプレイ要素（例えば、色や明るさなど）が区分されて表示されてよい。例えば、記事件数が多いほどグラフバーの色は濃く、記事件数が少ないほどグラフバーの色は薄く表示されてよい。 The time domain 1010 includes the time information and the number of articles of each unit time included in the issue cluster. For example, one axis shows the time information and the other axis shows the number of articles and the length of the bar. The form of the bar graph shown by may be displayed. At this time, the display elements (for example, color, brightness, etc.) of the graph bar may be classified and displayed according to the number of articles. For example, the larger the number of articles, the darker the color of the graph bar, and the smaller the number of articles, the lighter the color of the graph bar.

記事領域１０２０は、１つのイシュークラスタとして併合された単位時間別の記事リストが日付順、記事件数順などの一定の基準によって整列されてよく、例えば、ここ最近のクラスタが最上位に表示されてもよいし、あるいは最も多くの記事を含んだクラスタが最上位に表示されてもよい。 In the article area 1020, the article list by unit time merged as one issue cluster may be arranged by a certain criterion such as date order, article number order, for example, the recent cluster is displayed at the top. Alternatively, the cluster containing the most articles may be displayed at the top.

図１１を参照すると、イシュータイムライン画面１０００において、時間領域１０１０は、一方向（例えば、左右方向）のスクロール１１０１が可能であり、記事領域１０２０は、時間領域１０１０と同一あるいは他の方向（例えば、上下方向）のスクロール１１０２が可能となるようにインタフェースを構成してよい。 Referring to FIG. 11, in the issue timeline screen 1000, the time domain 1010 can be scrolled in one direction (for example, left-right direction), and the article area 1020 is in the same direction as the time domain 1010 or in another direction (for example, for example). The interface may be configured to allow scrolling 1102 (up and down).

イシュークラスタがサービスを通じて表示されることにより、ユーザは、イシュークラスタの時間領域１０１０に対するスクロール１１０１により、該当のイシューが登場した後にクラスタリングされた全体期間内で単位時間別の記事件数を確認することができ、イシュー登場時点、最もイシューとなった時点などのように時間経過によるイシューの変化を一目で確認することができる。 By displaying the issue cluster through the service, the user can check the number of articles by unit time within the entire clustered period after the corresponding issue appears by scrolling 1101 to the time domain 1010 of the issue cluster. It is possible to check at a glance the change in the issue over time, such as when the issue appears and when it becomes the most issue.

また、記事領域１０２０に対するスクロール１１０２により、イシュークラスタに含まれた単位時間別の記事リストを事前に定められた整列順に示すことができ、ユーザは、単位時間別にクラスタリングされた記事を確認することができる。記事領域１０２０には、単位時間別に一定の件数（例えば、３件）の一部記事だけを示し、ユーザから特定の日付の記事リストに対して別途の要求（例えば、もっと見る、全体表示など）が入力されれば、全体記事を示してよい。 Further, by scrolling 1102 to the article area 1020, the article list by unit time included in the issue cluster can be shown in a predetermined arrangement order, and the user can confirm the articles clustered by unit time. can. The article area 1020 shows only a certain number of articles (for example, 3 articles) for each unit time, and a separate request from the user to the article list on a specific date (for example, more, whole display, etc.). If is entered, the entire article may be shown.

記事領域１０２０に対してスクロール１１０２がなされるとき、記事リストの画面表示に合わせて時間領域１０１０が自動でスクロールされてよい。一例として、記事領域１０２０に対するスクロール１１０２により、時間領域１０１０で画面の事前に定められた基準範囲（例えば、中央線など）に表示された記事リストの単位時間グラフバーが中央に自動でスクロールされ、他の単位時間グラフバーと区別されるように表示されてよい。 When the scroll 1102 is scrolled to the article area 1020, the time domain 1010 may be automatically scrolled according to the screen display of the article list. As an example, scroll 1102 for article area 1020 automatically scrolls the unit time graph bar of the article list displayed in a predetermined reference range (eg, centerline) of the screen in time domain 1010 to the center. It may be displayed so as to be distinguished from other unit time graph bars.

図１２を参照すると、記事領域１０２０に対するスクロール１１０２により、単位時間別の記事リストを確認する過程において、時間領域１０１０が初期状態（図１０）に比べて簡略化されてよい。例えば、記事領域１０２０に対するスクロール１１０２が一定時間（例えば、１秒）以上続けば、時間領域１０１０のサイズが小さくなるか、時間領域１０１０のグラフバーが縮小された形態で表示されてよい。記事領域１０２０のスクロール１１０２によって最上端に整列された記事リストが画面に再び表示される場合、時間領域１０１０は初期状態（図１０）に復元されてよい。 Referring to FIG. 12, the time domain 1010 may be simplified as compared with the initial state (FIG. 10) in the process of confirming the article list by unit time by the scroll 1102 for the article area 1020. For example, if the scroll 1102 with respect to the article area 1020 continues for a certain period of time (for example, 1 second) or more, the size of the time domain 1010 may be reduced or the graph bar of the time domain 1010 may be displayed in a reduced form. When the article list aligned at the top edge by the scroll 1102 of the article area 1020 is displayed again on the screen, the time area 1010 may be restored to the initial state (FIG. 10).

また、記事領域１０２０に対するスクロール１１０２の方向によって時間領域１０１０の表示の可否、つまり、表示状態または隠し状態が選択的に適用されてよい。例えば、記事領域１０２０が下方向にスクロールされる場合には時間領域１０１０が隠し処理される反面、記事領域１０２０が上方向にスクロールされる場合には時間領域１０１０が再び表示されてよい。 Further, whether or not the time domain 1010 can be displayed, that is, the display state or the hidden state may be selectively applied depending on the direction of the scroll 1102 with respect to the article area 1020. For example, when the article area 1020 is scrolled downward, the time domain 1010 is hidden, while when the article area 1020 is scrolled upward, the time domain 1010 may be displayed again.

同じように、時間領域１０１０に対するスクロール１１０１時に、単位時間グラフバーの画面表示に合わせて記事領域１０２０が自動でスクロールされてよい。言い換えれば、時間領域１０１０と記事領域１０２０とが相互に連結され、一領域のスクロールに合わせて他の領域が自動でスクロールされてよい。 Similarly, at the time of scrolling 1101 with respect to the time domain 1010, the article area 1020 may be automatically scrolled according to the screen display of the unit time graph bar. In other words, the time domain 1010 and the article domain 1020 may be interconnected, and the other areas may be automatically scrolled according to the scrolling of one area.

他の例として、記事領域１０２０に対してスクロール１１０２がなされるときに時間領域１０１０が自動でスクロールされる反面、時間領域１０１０に対してスクロール１１０１がなされるときに記事領域１０２０が既存の位置で固定状態を維持することも可能であり、その反対も可能であることはもちろんである。 As another example, the time domain 1010 is automatically scrolled when the scroll 1102 is scrolled to the article area 1020, while the article area 1020 is in the existing position when the scroll 1101 is scrolled to the time domain 1010. Of course, it is possible to maintain a fixed state and vice versa.

時間領域１０１０に対してスクロール１１０１がなされるときに記事領域１０２０が自動でスクロールされずに固定状態を維持するようにインタフェースが構成された場合、時間領域１０１０で特定の日付のグラフ、例えば、７月２６日のグラフバーを選択すれば、記事領域１０２０では７月２６日付けの記事リストに自動でスクロールされて表示されてよい。時間領域１０１０では、選択された日付のグラフバーが他の単位時間グラフバーとは区別されるように表示されてよい。時間領域１０１０の初期状態（図１０）で特定の日付のグラフバーが選択された場合、選択された日付の記事リストが記事領域１０２０に表示されると同時に、時間領域１０１０は簡略化されて表示されてよい。 If the interface is configured to keep the article area 1020 in a fixed state without being automatically scrolled when the scroll 1101 is made to the time area 1010, then a graph of a particular date in the time area 1010, eg, 7 If the graph bar of the 26th of the month is selected, the article area 1020 may be automatically scrolled and displayed in the article list dated the 26th of July. In the time domain 1010, the graph bar for the selected date may be displayed so as to be distinct from other unit time graph bars. When the graph bar of a specific date is selected in the initial state of the time domain 1010 (FIG. 10), the article list of the selected date is displayed in the article area 1020, and at the same time, the time domain 1010 is displayed in a simplified manner. May be done.

このように、本発明の実施形態によると、時間帯別に生成された短期クラスタをクラスタ間の類似度に基づいて併合することにより、中／長期的に関連のあるイシュー単位のクラスタグループとしてイシュークラスタを生成することができる。クラスタ間の類似度に基づいてクラスタを併合してイシュークラスタを生成することにより、時間の経過によって主題や内容が変わって記事間の類似度が低下したとしても、関連のあるイシューの記事を効果的にクラスタリングすることができる。以前に計算されたクラスタのベクトル値を利用して記事が追加されたクラスタや他のクラスタと併合されたクラスタのベクトル値を計算する方式でクラスタ併合を実行することにより、計算量を飛躍的に減らし、迅速かつ安定的なクラスタリング性能を提供することができる。 Thus, according to the embodiment of the present invention, by merging short-term clusters generated by time zone based on the similarity between clusters, issue clusters are formed as cluster groups of issue units that are related in the medium to long term. Can be generated. By merging clusters based on the similarity between clusters to create an issue cluster, articles with related issues will be effective even if the subject or content changes over time and the similarity between articles decreases. Can be clustered. By performing cluster merging by using the vector value of the previously calculated cluster to calculate the vector value of the cluster to which the article was added or the cluster merged with other clusters, the amount of calculation is dramatically increased. It can be reduced to provide fast and stable clustering performance.

さらに、本発明の実施形態によると、イシュークラスタを利用することで、中／長期的に連関性の高い記事を時系列で示すことのできる記事タイムラインを提供することができる。イシュークラスタによって単位時間に対する表示領域と各単位時間別にクラスタリングされた記事を表示することにより、長期間に連関性の高い記事を一目で把握することができ、時間領域と記事領域とが有機的に連結される構造で記事確認のためのナビゲーション機能を提供することにより、ユーザの利便性を向上させることができる。 Further, according to the embodiment of the present invention, by using the issue cluster, it is possible to provide an article timeline capable of showing articles with high relevance in the medium / long term in chronological order. By displaying the display area for each unit time and the articles clustered for each unit time by the issue cluster, it is possible to grasp the articles that are highly related for a long period of time at a glance, and the time area and the article area are organically separated. By providing a navigation function for checking articles in a connected structure, it is possible to improve user convenience.

上述した装置は、ハードウェア構成要素、ソフトウェア構成要素、および／またはハードウェア構成要素とソフトウェア構成要素との組み合わせによって実現されてよい。例えば、実施形態で説明された装置および構成要素は、プロセッサ、コントローラ、ＡＬＵ（ａｒｉｔｈｍｅｔｉｃｌｏｇｉｃｕｎｉｔ）、デジタル信号プロセッサ、マイクロコンピュータ、ＦＰＧＡ（ｆｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）、ＰＬＵ（ｐｒｏｇｒａｍｍａｂｌｅｌｏｇｉｃｕｎｉｔ）、マイクロプロセッサ、または命令を実行して応答することができる様々な装置のように、１つ以上の汎用コンピュータまたは特殊目的コンピュータを利用して実現されてよい。処理装置は、オペレーティングシステム（ＯＳ）およびＯＳ上で実行される１つ以上のソフトウェアアプリケーションを実行してよい。また、処理装置は、ソフトウェアの実行に応答し、データにアクセスし、データを記録、操作、処理、および生成してもよい。理解の便宜のために、１つの処理装置が使用されるとして説明される場合もあるが、当業者は、処理装置が複数個の処理要素および／または複数種類の処理要素を含んでもよいことが理解できるであろう。例えば、処理装置は、複数個のプロセッサまたは１つのプロセッサおよび１つのコントローラを含んでよい。また、並列プロセッサのような、他の処理構成も可能である。 The above-mentioned device may be realized by a hardware component, a software component, and / or a combination of the hardware component and the software component. For example, the apparatus and components described in the embodiments include a processor, a controller, an ALU (arithmetic logic unit), a digital signal processor, a microcomputer, an FPGA (field programgable gate array), a PLU (programmable log unit), a microprocessor, and the like. Alternatively, it may be implemented using one or more general purpose computers or special purpose computers, such as various devices capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the OS. The processing device may also respond to the execution of the software, access the data, and record, manipulate, process, and generate the data. For convenience of understanding, one processing device may be described as being used, but one of ordinary skill in the art may include a plurality of processing elements and / or a plurality of types of processing elements. You can understand. For example, the processing device may include multiple processors or one processor and one controller. Also, other processing configurations such as parallel processors are possible.

ソフトウェアは、コンピュータプログラム、コード、命令、またはこれらのうちの１つ以上の組み合わせを含んでもよく、思うままに動作するように処理装置を構成したり、独立的または集合的に処理装置に命令したりしてよい。ソフトウェアおよび／またはデータは、処理装置に基づいて解釈されたり、処理装置に命令またはデータを提供したりするために、いかなる種類の機械、コンポーネント、物理装置、コンピュータ記録媒体または装置に具現化されてよい。ソフトウェアは、ネットワークによって接続されたコンピュータシステム上に分散され、分散された状態で記録されても実行されてもよい。ソフトウェアおよびデータは、１つ以上のコンピュータ読み取り可能な記録媒体に記録されてよい。 The software may include computer programs, codes, instructions, or a combination of one or more of these, configuring the processing equipment to operate at will, or instructing the processing equipment independently or collectively. You may do it. The software and / or data is embodied in any type of machine, component, physical device, computer recording medium or device to be interpreted based on the processing device or to provide instructions or data to the processing device. good. The software is distributed on a computer system connected by a network and may be recorded or executed in a distributed state. The software and data may be recorded on one or more computer-readable recording media.

実施形態に係る方法は、多様なコンピュータ手段によって実行可能なプログラム命令の形態で実現されてコンピュータ読み取り可能な媒体に記録されてよい。このとき、媒体は、コンピュータ実行可能なプログラムを継続して記録するものであっても、実行またはダウンロードのために一時記録するものであってもよい。また、媒体は、単一または複数のハードウェアが結合した形態の多様な記録手段または格納手段であってよく、あるコンピュータシステムに直接接続する媒体に限定されることはなく、ネットワーク上に分散して存在するものであってもよい。媒体の例としては、ハードディスク、フロッピディスク、磁気テープのような磁気媒体、ＣＤ－ＲＯＭ、ＤＶＤのような光媒体、フロプティカルディスク（ｆｌｏｐｔｉｃａｌｄｉｓｋ）のような光磁気媒体、およびＲＯＭ、ＲＡＭ、フラッシュメモリなどを含み、プログラム命令が記録されるように構成されたものであってよい。また、媒体の他の例として、アプリケーションを配布するアプリケーションストアやその他の多様なソフトウェアを供給または配布するサイト、サーバなどで管理する記録媒体または格納媒体も挙げられる。 The method according to the embodiment may be realized in the form of program instructions that can be executed by various computer means and recorded on a computer-readable medium. At this time, the medium may be a continuous recording of a computer-executable program, or a temporary recording for execution or download. Further, the medium may be various recording means or storage means in the form of a combination of a single piece of hardware or a plurality of pieces of hardware, and is not limited to a medium directly connected to a certain computer system, but is distributed over a network. It may exist. Examples of media include hard disks, floppy disks, magnetic media such as magnetic tapes, optical media such as CD-ROMs and DVDs, optical magnetic media such as floppy disks, and ROMs, RAMs. It may be configured to include a flash memory or the like and record program instructions. Other examples of media include recording media or storage media managed by application stores that distribute applications, sites that supply or distribute various other software, servers, and the like.

以上のように、実施形態を、限定された実施形態と図面に基づいて説明したが、当業者であれば、上述した記載から多様な修正および変形が可能である。例えば、説明された技術が、説明された方法とは異なる順序で実行されたり、かつ／あるいは、説明されたシステム、構造、装置、回路などの構成要素が、説明された方法とは異なる形態で結合されたりまたは組み合わされたり、他の構成要素または均等物によって対置されたり置換されたとしても、適切な結果を達成することができる。 As described above, the embodiments have been described based on the limited embodiments and drawings, but those skilled in the art can make various modifications and modifications from the above description. For example, the techniques described may be performed in a different order than the methods described, and / or components such as the systems, structures, devices, circuits described may be in a different form than the methods described. Appropriate results can be achieved even if they are combined or combined, and confronted or replaced by other components or equivalents.

したがって、異なる実施形態であっても、特許請求の範囲と均等なものであれば、添付される特許請求の範囲に属する。 Therefore, even if it is a different embodiment, if it is equivalent to the claims, it belongs to the attached claims.

２２２：プロセッサ
３１０：クラスタ収集部
３２０：クラスタ生成部
３３０：クラスタ表示部 222: Processor 310: Cluster collection unit 320: Cluster generation unit 330: Cluster display unit

Claims

A method of providing a document timeline executed by a computer system.
The computer system comprises at least one processor configured to execute a computer-readable instruction contained in memory.
The method for providing the document timeline is as follows.
The stage of creating an issue cluster as a cluster group by merging the clusters based on the similarity between clusters composed of similar documents by the at least one processor, and the issue cluster by the at least one processor. For the documents included in the issue cluster, the time domain in which the number of documents included in the issue cluster by unit time is displayed in a graph format and the document list included in the issue cluster. Includes the stage of displaying a timeline that includes a document area that is displayed by unit time .
A document timeline characterized in that the time domain is simplified and displayed in a reduced form of the size of the time domain or the graph of the time domain when scrolling to the document area. Providing method.

A method of providing a document timeline executed by a computer system.
The computer system comprises at least one processor configured to execute a computer-readable instruction contained in memory.
The method for providing the document timeline is as follows.
The stage of creating an issue cluster as a cluster group by merging the clusters based on the similarity between clusters composed of similar documents by the at least one processor, and
A time domain in which the number of documents included in the issue cluster by unit time is displayed in a graph form for the documents included in the issue cluster by using the issue cluster by the at least one processor. The stage of displaying the timeline including the document area in which the document list included in the issue cluster is displayed by unit time.
Including
A method for providing a document timeline, characterized in that a displayed state or a hidden state of the time area is selectively applied according to a scrolling direction with respect to the document area.

The stage to be displayed is
The document timeline providing method according to claim 1 or 2 , which comprises a step of determining an issue cluster containing a document specified by a user's selection or setting or a document specified by a content provider as a display target.

The stage to be displayed is
Any one of claims 1 to 3, including the step of determining an issue cluster whose display target is at least one of the number of documents, the elapsed time after clustering, and the number of comments. How to provide the document timeline described in the section .

The document timeline providing method according to any one of claims 1 to 4, wherein the navigation function is provided in a structure in which the time domain and the document area are organically connected.

One of claims 1 to 5, wherein the time domain is scrollable in one direction, and the document area includes an interface capable of scrolling in a direction different from that of the time domain. How to provide the document timeline described in the section .

The document timeline according to any one of claims 1 to 6, wherein the time area is automatically scrolled according to the screen display of the document list according to the scrolling to the document area. Providing method.

Of claims 1 to 7 , when a graph bar for a particular unit time is selected in the time domain, the document area is scrolled and displayed in the document list for the selected unit time. The method of providing the document timeline described in any one of the above.

A computer program for causing the computer system to execute the method for providing a document timeline according to any one of claims 1 to 8 .

A non-temporary computer-readable recording medium in which a program for causing a computer to execute the document timeline providing method according to any one of claims 1 to 8 is recorded.

It ’s a computer system,
Contains at least one processor configured to execute computer-readable instructions contained in memory.
The at least one processor
A cluster generator that creates an issue cluster as a cluster group by merging the clusters based on the similarity between clusters composed of similar documents, and a document included in the issue cluster using the issue cluster. On the other hand, it includes a time domain in which the number of documents by unit time included in the issue cluster is displayed in a graph form, and a document area in which the document list included in the issue cluster is displayed by unit time. Includes a cluster display that displays the timeline
A computer system characterized in that, when scrolling to the document area, the time domain is simplified and displayed in a reduced form of the size of the time domain or the graph of the time domain .

It ’s a computer system,
At least one processor configured to execute computer-readable instructions contained in memory
Including
The at least one processor
A cluster generator that creates an issue cluster as a cluster group by merging the clusters based on the similarity between clusters composed of similar documents, and a cluster generator.
For the documents included in the issue cluster using the issue cluster, the time domain in which the number of documents by unit time included in the issue cluster is displayed in a graph form and the documents included in the issue cluster are included. Cluster display unit that displays the timeline including the document area where the document list is displayed by unit time
Including
A computer system, characterized in that the displayed or hidden state of the time domain is selectively applied according to the scrolling direction with respect to the document area.

The cluster display unit is
The computer system according to claim 11 or 12 , wherein an issue cluster including a document specified by a user's selection or setting or a document specified by a content provider is determined as a display target.

The cluster display unit is
Any of claims 11 to 13 , characterized in that at least one of the number of documents, the elapsed time after clustering, and the number of comments determines an issue cluster that meets a predetermined condition as a display target. The computer system described in one paragraph .

The computer system according to any one of claims 11 to 14, wherein the navigation function is provided in a structure in which the time domain and the document domain are organically connected.

One of claims 11 to 15, wherein the time domain is scrollable in one direction, and the document area includes an interface capable of scrolling in a direction different from that of the time domain. The computer system described in the section .