JP5928091B2

JP5928091B2 - Tag group classification method, apparatus, and data mashup method, apparatus

Info

Publication number: JP5928091B2
Application number: JP2012079208A
Authority: JP
Inventors: ジャン・ジュン; ジョォン・チャオリアン; ワン・ジュロォン; 憲二大木; 昌弘田中; 照宣粂; 昭彦松尾
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-04-19
Filing date: 2012-03-30
Publication date: 2016-06-01
Anticipated expiration: 2032-03-30
Also published as: CN102750289B; JP2012226740A; CN102750289A

Description

本発明は、データ処理に関し、より具体的に、タググループの分類方法及び装置、並びにデータマッシュアップ方法及び装置に関する。 The present invention relates to data processing, and more specifically, to a tag group classification method and apparatus, and a data mashup method and apparatus.

現在、データを記述するための各種のデータフォーマット仕様、例えば、XML（eXtensible Markup Language：拡張可能なマーク付け言語）、JSON（JavaScript(登録商標) Object Notation：JavaScriptオブジェクトの表記）、或はCSV（Comma Separated Values：カンマ区切り値）等が存在している。各種のデータフォーマット仕様のそれぞれにおいては、データ内容の意味を記述するためのタグがそれぞれ定義されている。例えば、リスト型のデータ、例えば若干のニュースを含むニュースリストに対して、ニュース内容を記述するための一つのグループのタグ、即ち、title（タイトル）、pubdate（発表時間）、author（作者）等を定義することができる。また、例えば、若干のスケジュールを含むスケジュール表に対して、スケジュール内容を記述するための一つのグループのタグ、即ち、starttime（スタートタイム）、endtime（エンドタイム）、attendees（参加者）及びlocation（場所）などを定義することができる。従って、当該タグを利用して、データ内容を便利に発表したり、アクセスしたりすることができる。 Currently, various data format specifications for describing data, such as XML (eXtensible Markup Language), JSON (JavaScript (registered trademark) Object Notation), CSV ( Comma Separated Values). Each of the various data format specifications defines a tag for describing the meaning of the data content. For example, for a list type data, for example, a news list including some news, tags of one group for describing the news contents, that is, title (title), pubdate (announcement time), author (author), etc. Can be defined. Further, for example, for a schedule table including a few schedules, tags of one group for describing schedule contents, that is, starttime (end time), endtime (endtime), attendees (participant), and location ( Place) etc. can be defined. Therefore, the data contents can be conveniently announced or accessed using the tag.

しかしながら、意味が同一または類似するデータ内容について、異なるデータフォーマット仕様が異なるタグを採用して記述を行う可能性がある。例えば、データ内容「データ作成者」について、異なるデータフォーマット仕様において、「author（作者）」、「writer(筆者）)または「creator（創作者）」などの異なるタグを採用する可能性がある。従って、異なるタグで同一または類似する意味を記述したデータ内容を認識するとともに、統一的なタグで前記同一または類似するデータ内容を記述することによって、同一または類似する意味を有するデータ内容のマッシュアップを完成させるという希望が存在する。 However, data contents having the same or similar meaning may be described by using different tags with different data format specifications. For example, for data content “data creator”, different tags such as “author”, “writer” or “creator” may be employed in different data format specifications. Therefore, by recognizing the data contents describing the same or similar meaning with different tags and describing the same or similar data contents with a uniform tag, mashup of the data contents having the same or similar meaning There is a hope to complete.

従来の技術において、複数のデータ内容そのものを直接に比較することによって、複数のデータ内容同士が同一または類似するか否かを判断する。データ内容そのもののデータ量が大きいため、直接に複数のデータ内容そのものを比較すると、演算量が大きくなり、判断の正確さも悪くなることがよくある。 In the conventional technique, it is determined whether or not the plurality of data contents are the same or similar by directly comparing the plurality of data contents themselves. Since the amount of data of the data content itself is large, when a plurality of data contents themselves are directly compared, the amount of calculation becomes large, and the accuracy of judgment often deteriorates.

なお、従来の技術において、二つのタグ同士が同一または類似するか否かを比較することによって、二つのタグで記述されたデータ内容が同一または類似するか否かを判断する技術もある。しかし、実際の応用には、各種の異なるデータフォーマット仕様が存在しており、その採用されたタグもそれぞれ異なる。タグとタグの比較を行うだけであれば、各種タグの多種の特徴を総合に考慮することが難しくなり、判断の正確さが悪くなってしまう。 In addition, in the conventional technique, there is also a technique for determining whether or not the data contents described by two tags are the same or similar by comparing whether or not the two tags are the same or similar. However, in actual applications, there are various different data format specifications, and the adopted tags are different. If only tags are compared, it will be difficult to comprehensively consider various features of various tags, and the accuracy of judgment will deteriorate.

そして、上記のように、例えば、若干のニュースを含むニュースリストに対して、一つのニュースの内容を記述するための一つのグループのタグ（以下は、「タググループ」と称す）、即ち、title（タイトル）、pubdate（発表時間）、author（作者）等を定義することができる。これにより、一般的には、一つのデータ内容が当該データ内容を記述するための、若干のタグを含むタググループにより定義されたものであることが分かった。従って、複数のデータ内容の間に同一または類似する意味を有するか否かを判断するには、複数のデータ内容を記述するための複数のタググループが同一または類似するか否かを総合的に判断する必要がある。タグとタグとを比較するだけであれば、若干のタグを含むタググループにより記述されたデータ内容が同一または類似する意味を有するか否かを判断し難い。 As described above, for example, for a news list including some news, one group tag (hereinafter referred to as “tag group”) for describing the content of one news, that is, title (Title), pubdate (announcement time), author (author), etc. can be defined. As a result, it has been found that one data content is generally defined by a tag group including a few tags for describing the data content. Therefore, in order to determine whether or not there is the same or similar meaning among a plurality of data contents, it is comprehensively determined whether or not a plurality of tag groups for describing a plurality of data contents are the same or similar. It is necessary to judge. If only tags are compared, it is difficult to determine whether or not the data contents described by a tag group including some tags have the same or similar meaning.

出願人が、上記の課題に鑑みて、複数のタググループが同一又は類似するか否かを比較することによって、同一または類似する意味を有するデータ内容を認識すべきであることを考えた。本発明の基本的な思想は、複数のタググループが同一または類似するか否かを比較するために、まず同一または類似するタググループを、同一のクラスに区分し、その後に新たに発見されたタググループを、区分されたタググループのクラスと比較することにある。同一のクラスにおけるすべてのタググループが同一または類似しているので、タググループのクラスは各種のタググループの多種の特徴を総合的に考慮した。したがって、タググループとタグループのクラスとを比較することにより、より正確的にタググループの間の同一または類似を判断することができる。 In view of the above problems, the applicant considered that data contents having the same or similar meaning should be recognized by comparing whether or not a plurality of tag groups are the same or similar. The basic idea of the present invention is that, in order to compare whether or not a plurality of tag groups are the same or similar, first, the same or similar tag groups are classified into the same class and then newly discovered. The tag group is compared with the class of the segmented tag group. Since all tag groups in the same class are the same or similar, the tag group class comprehensively considers the various features of the various tag groups. Therefore, by comparing the tag group and the class of the tag group, it is possible to more accurately determine the same or similar between the tag groups.

本発明の目的は、タググループの分類方法及び装置、並びにデータマッシュアップ方法及び装置を提供することにある。 An object of the present invention is to provide a tag group classification method and apparatus, and a data mashup method and apparatus.

本発明の一実施例によれば、コンピュータが、少なくとも一つのタグと、前記少なくとも一つのタグにより定義された相応するデータとを含むタググループに対して分類を行う方法が提供される。前記コンピュータが、同義のタグが属する同義タグセット群と、1つのデータリストのデータを定義するタグが属するタググループ群とから、あるグループに属するタグが、いずれの同義タグにいくつ現れるかを示す要素群を生成し、前記要素群から、各タググループに対応する特徴ベクトルを生成し、前記タググループを、各タググループの特徴ベクトルの類似度に応じてクラスに分類する。 According to one embodiment of the present invention, a method is provided for a computer to classify a tag group that includes at least one tag and corresponding data defined by the at least one tag. The computer shows how many tags belonging to a certain group appear in which synonymous tag from the synonymous tag set group to which the synonymous tag belongs and the tag group group to which the tag defining the data of one data list belongs. An element group is generated, a feature vector corresponding to each tag group is generated from the element group, and the tag group is classified into classes according to the similarity of the feature vectors of each tag group.

本発明の少なくとも一つの実施例によれば、タググループの特徴ベクトルと、タググループのクラスのコア特徴ベクトルとの間の類似度を比較することにより、より正確に、より効率にタググループ間の同一または類似を判断でき、より正確に、より効率に同一又は類似するデータをマッシュアップすることができる。 According to at least one embodiment of the present invention, by comparing the similarity between a tag group feature vector and a tag group class core feature vector, more accurately and more efficiently between tag groups. The same or similar can be determined, and the same or similar data can be mashed up more accurately and more efficiently.

本発明の一実施例による、タググループに対して分類を行う方法を示すフローチャートである。4 is a flowchart illustrating a method for classifying a tag group according to an embodiment of the present invention. 本発明の一実施例による、タググループに対して分類を行う方法の分類ステップの具体的な手順を示すフローチャートである。4 is a flowchart illustrating a specific procedure of a classification step of a method for performing classification on a tag group according to an embodiment of the present invention. 本発明の一実施例による、タググループに対して分類を行う装置を示すブロック図である。FIG. 3 is a block diagram illustrating an apparatus for classifying tag groups according to an embodiment of the present invention. 本発明の一実施例による、タググループに基づいてデータをマッシュアップする方法を示すフローチャートである。6 is a flowchart illustrating a method for mashing data based on tag groups according to an embodiment of the present invention. 本発明の一実施例による、タググループに基づいてデータをマッシュアップする装置を示すブロック図である。FIG. 3 is a block diagram illustrating an apparatus for mashing up data based on tag groups according to an embodiment of the present invention. 本発明の一実施例を実現するコンピュータの例示的な構造を示すブロック図である。And FIG. 7 is a block diagram illustrating an exemplary structure of a computer that implements an embodiment of the present invention.

本願に使用された用語は、特定の実施例を説明するためのものに過ぎず、本発明を限定する意図がない。本願に使用された単数形の「一」と「当該」とは、文脈において別途に明記する場合以外、複数形も含むことを意図する。また、「含む」という単語は、本明細書に使用される際に、示された特徴、全体、ステップ、操作、ユニット及び／又はコンポーネントの存在を意味するが、一つ又は複数の他の特徴、全体、ステップ、操作、ユニット及び／又はコンポーネント、及び／又はそれらの組合せの存在又は付加を排除するわけではない。 The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application, the singular forms “a” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Also, the word “comprising” as used herein means the presence of the indicated feature, whole, step, operation, unit and / or component, but one or more other features. Does not exclude the presence or addition of whole, steps, operations, units and / or components, and / or combinations thereof.

以下、図面を参照しながら本発明の実施例を説明する。ここで注意すべきなのは、明瞭にするために、図面及び説明において、本発明と無関係の、当業者の既知している部品及び処理の表記及び説明が省略されたことである。フローチャート及び／またはブロック図の各ブロック、並びにフローチャート及び／またはブロック図における各ブロックの組み合わせは、コンピュータプログラムの命令によって実現可能である。これらのプログラムの命令は、汎用コンピュータ、専用コンピュータまたはその他のプログラミング可能なデータ処理装置のプロセッサに提供されることができる。これにより、コンピュータまたはその他のプログラミング可能なデータ処理装置により実行されたこれらの命令に、フローチャート及び／またはブロック図におけるブロックで規定された機能／操作を実現する装置を生成させる機械を提供している。 Embodiments of the present invention will be described below with reference to the drawings. It should be noted here that, for the sake of clarity, notations and descriptions of parts and processes known to those skilled in the art that are not related to the present invention have been omitted from the drawings and description. Each block in the flowchart and / or block diagram and a combination of each block in the flowchart and / or block diagram can be realized by instructions of a computer program. The instructions of these programs can be provided to the processor of a general purpose computer, special purpose computer or other programmable data processing device. This provides a machine that causes these instructions executed by a computer or other programmable data processing device to generate a device that implements the functions / operations defined by the blocks in the flowcharts and / or block diagrams. .

これらのコンピュータプログラムの命令を、コンピュータまたはその他のプログラミング可能なデータ処理装置を特定の形態で作動するように制御できるコンピュータ読取可能な媒体に記憶してもよい。このように、コンピュータ読取可能な媒体に記憶された命令により、フローチャート及び／またはブロック図におけるブロックで規定された機能/操作を実現する命令手段(instruction means)を含む製品が提供された。 The instructions of these computer programs may be stored on a computer readable medium that can be controlled to operate a computer or other programmable data processing device in a particular fashion. Thus, a product is provided that includes instructions means for implementing the functions / operations defined by the blocks in the flowcharts and / or block diagrams by means of instructions stored on a computer readable medium.

コンピュータプログラムの命令をコンピュータまたはその他のプログラミング可能なデータ処理装置上にロードし、コンピュータまたはその他のプログラミング可能なデータ処理装置で一連の操作ステップを実行させてコンピュータによる実現過程を生成する。これにより、コンピュータまたはその他のプログラミング可能なデータ処理装置で実行された命令は、フローチャート及び／またはブロック図におけるブロックで規定された機能／操作を実現する過程を提供している。 Computer program instructions are loaded onto a computer or other programmable data processing device and a series of operational steps are performed on the computer or other programmable data processing device to generate a computer implementation. Thus, instructions executed on a computer or other programmable data processing device provide a process for implementing the functions / operations defined by the blocks in the flowcharts and / or block diagrams.

図面中のフローチャート及びブロック図は、本発明の各種の実施例に従うシステム、方法及びコンピュータプログラム製品による実現可能なシステムアーキテクチャー、機能及び操作を示している。この点について、フローチャート及びブロック図における各ブロックは、一つのモジュール、ブログラムセグメントまたはコードの一部を表すことができる。前記モジュール、ブログラムセグメントまたはコードの一部は、所定のロジック機能を実現するための一つまたは複数の実行可能な命令を含む。また、幾つかの置き換えとしての実現においては、ブロックにおいて標記された機能も、図面において標記された順番と異なって発生しても良い。例えば、二つのつながって示されているブロックは、実際に、基本的に並行して実行されても良く、逆の順で実行されることもあり、係わる機能によって決まられる。また、ブロック図及び／またはフローチャートにおける各ブロックと、ブロック図及び／またはフローチャートにおけるブロックの組み合わせとは、所定の機能または操作を実行するための専用の、ハードウェアによるシステムによって実現されても良く、或は、専用ハードウェアとコンピュータの命令との組み合わせによって実現されても良い。 The flowchart and block diagrams in the Figures illustrate the system architecture, functionality, and operation that can be implemented by systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowcharts and block diagrams may represent a module, program segment, or portion of code. The module, program segment, or code portion includes one or more executable instructions for implementing a predetermined logic function. Also, in some replacement implementations, the functions marked in the blocks may occur differently from the order marked in the drawing. For example, two connected blocks may actually be executed in parallel, or may be executed in the reverse order, depending on the function involved. Each block in the block diagram and / or flowchart and the combination of blocks in the block diagram and / or flowchart may be realized by a dedicated hardware system for executing a predetermined function or operation, Alternatively, it may be realized by a combination of dedicated hardware and computer instructions.

以下、図１を参照しながら本発明の一実施例による、タググループに対して分類を行う方法を説明する。図１は、本発明の一実施例による、タググループに対して分類を行う方法を示すフローチャートである。なお、当該方法は、図３に示す装置において行われる。 Hereinafter, a method for classifying tag groups according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a flowchart illustrating a method for classifying tag groups according to an embodiment of the present invention. This method is performed in the apparatus shown in FIG.

図１に示されたように、当該方法は、ステップ１００からスタートする。次に、ステップ１０２において、図３に示す装置の同義タグセット特定ユニット３００が、複数の同義タグセットのうち、タググループにおける各タグの所属する同義タグセットを特定する。 As shown in FIG. 1, the method starts at step 100. Next, in step 102, the synonymous tag set specifying unit 300 of the apparatus shown in FIG. 3 specifies a synonymous tag set to which each tag in the tag group belongs among a plurality of synonymous tag sets.

同義タグセット（S）は、同一または類似した意味（即ち同義）を有する一つのグループのタグにより構成されたセットである。例として、以下のような若干の同義タグセットが存在しても良い。 The synonym tag set (S) is a set composed of one group of tags having the same or similar meaning (that is, synonym). As an example, there may be some synonymous tag sets as follows.

S1：author（作者）、creator（創作者）、writer（筆者）
S2：pubdate（公開時間）、publishdate（発表時間）
S3：URL（統一資源位置指定子）、link（リンク）
S4：summary（要約）、description（概述）
S5：event（イベント）、title（タイトル）、what（何）
S6：starttime（スタートタイム）、when（何時）
S7：where（何処）、location（場所）
・・・
Sn：who（誰）、attendees（参加者）
ただし、ｎは、１以上の整数である。 S1: author, creator, writer
S2: pubdate (publication time), publishdate (announcement time)
S3: URL (Uniform Resource Location Specifier), link
S4: summary, description
S5: event, title, what
S6: starttime, when (what time)
S7: where, location
...
Sn: who, attendees (participants)
However, n is an integer of 1 or more.

上記の同義タグセットは、例示に過ぎず、必要に応じて他の同義タグセットが存在してもよい。実際の応用の経験に基づいてどれらのタグが同一または類似した意味を示すかを事前に特定することができる。また、使用の過程において、絶えずに、新たに発見された同一または類似した意味を有するタグを、前記同義タグセットに追加して、動的に前記同義タグセットを更新することもできる。例えば、同義辞書の形式で前記同義タグセットを提供しても良い。当業者は、例えば、データベースのような他の方式で前記同義タグセットを提供してもよいことが理解することができる。 The above synonymous tag sets are merely examples, and other synonymous tag sets may exist as necessary. Based on actual application experience, it can be determined in advance which tags have the same or similar meaning. In addition, in the process of use, a newly discovered tag having the same or similar meaning can be added to the synonymous tag set to dynamically update the synonymous tag set. For example, the synonym tag set may be provided in the form of a synonym dictionary. One skilled in the art can appreciate that the synonymous tag set may be provided in other ways, such as a database.

タググループ（T）は、一つのデータリストにおける相応するデータをそれぞれ定義するための一つのグループのタグにより構成されたセットである。例として、以下のような若干のタググループが存在してもよい。 The tag group (T) is a set composed of one group of tags for defining corresponding data in one data list. As an example, there may be some tag groups as follows.

T1：title（タイトル）、author（作者）、pubdate（公開時間）、summary（要約）
T2：title（タイトル）、publishdate（発表時間）、creator（創作者）、description（概述）、URL（統一資源位置指定子）
T3：title（タイトル）、link（リンク）、writer（筆者）、description（概述）
T4：title（タイトル）、link（リンク）、writer（筆者）、description（概述）
T5：event（イベント）、starttime（スタートタイム）、endtime（エンドタイム）、location（場所）、attendees（参加者）
T6：title（タイトル）、starttime（スタートタイム）、duration（期間）、where（何処）、attendees（参加者）
・・・
Tp：what（何）、where（何処）、who（誰）、when（何時）
ただし、ｐは、１以上の整数である。 T1: title, author, pubdate, summary
T2: title, publication date, creator, creator, description, URL (Uniform Resource Locator)
T3: title, link, writer, description
T4: title, link, writer, description
T5: event, starttime, endtime, location, attendees
T6: title, starttime, duration, where, where, attendes
...
Tp: what, where, who, when
However, p is an integer of 1 or more.

上記のタググループは例示に過ぎず、実際の応用において他のタググループが存在してもよい。例えば、異なるデータフォーマット仕様（例えば、XML、JSONまたはCSV等）は、異なるタググループを定義しても良く、或は、データの発表者は必要に応じて異なるタググループをカスタマイズしても良い。 The above tag groups are merely examples, and other tag groups may exist in actual applications. For example, different data format specifications (eg, XML, JSON, CSV, etc.) may define different tag groups, or data presenters may customize different tag groups as needed.

一つの新たなタググループ対して、前記の同義タグセットに基づいて新たなタググループにおける各タグの所属する同義タグセットを特定することができる。例えば、前記タググループT1に対して、タググループT1における各タグの順番に従って、タググループT1におけるタグ「title（タイトル）」が同義タグセットS5(即ち、タググループT1のうち同義タグセットS5に属するタグの数は１である)に属し、タググループT1におけるタグ「author（作者）」が同義タグセットS1(即ち、タググループT1のうち同義タグセットS1に属するタグの数は１である)に属し、タググループT1におけるタグ「公開時間」が同義タグセットS2(即ち、タググループT1のうち同義タグセットS2に属するタグの数は１である)に属し、及びタググループT1におけるタグ「summary（要約）」が同義タグセットS4(即ち、タググループT１のうち同義タグセットS4に属するタグの数は１である)に属することを順に特定することができる。また、前記のタググループT1に対して、前記同義タグセットS1から同義タグセットSnまでの順番に従って、タググループT1のうち同義タグセットS1に属するタグの数は１、タググループT1のうち同義タグセットS2に属するタグの数は１、タググループT1のうち同義タグセットS3に属するタグの数は0、タググループT1のうち同義タグセットS4に属するタグの数は１、タググループT1のうち同義タグセットS5に属するタグの数は１、タググループT1のうち同義タグセットS6に属するタグの数は0、及び、タググループT1のうち同義タグセットS7から同義タグセットSnまでの集合に属するタグの数はそれぞれ0であることを順に特定することができる。同じ方法により、前記タググループT2からタググループTpまでの各タググループにおける各タグのそれぞれが、前記同義タグセットS1から同義タグセットSnまでのうちのどのタグセットに属するかをそれぞれ特定することができる。 For one new tag group, the synonymous tag set to which each tag in the new tag group belongs can be specified based on the above synonymous tag set. For example, with respect to the tag group T1, the tag “title” in the tag group T1 belongs to the synonymous tag set S5 (that is, the tag group T1 belongs to the synonymous tag set S5 in accordance with the order of the tags in the tag group T1. The tag “author” in the tag group T1 belongs to the synonym tag set S1 (that is, the number of tags belonging to the synonym tag set S1 in the tag group T1 is 1). The tag “publication time” in the tag group T1 belongs to the synonym tag set S2 (that is, the number of tags belonging to the synonym tag set S2 in the tag group T1 is 1), and the tag “summary ( (Summary) ”belongs to the synonymous tag set S4 (that is, the number of tags belonging to the synonymous tag set S4 in the tag group T1 is 1) in order. Further, with respect to the tag group T1, the number of tags belonging to the synonym tag set S1 is one in the tag group T1 and the synonym tags in the tag group T1 according to the order from the synonym tag set S1 to the synonym tag set Sn. The number of tags belonging to the set S2 is 1, the number of tags belonging to the synonymous tag set S3 in the tag group T1, 0, the number of tags belonging to the synonymous tag set S4 in the tag group T1, and the same meaning in the tag group T1. The number of tags belonging to the tag set S5 is 1, the number of tags belonging to the synonym tag set S6 in the tag group T1, and the tags belonging to the set from the synonym tag set S7 to the synonym tag set Sn in the tag group T1. It can be specified in turn that the numbers of are each 0. According to the same method, each tag in each tag group from the tag group T2 to the tag group Tp can respectively identify which tag set from the synonymous tag set S1 to the synonymous tag set Sn. it can.

次に、当該方法は、ステップ１０４に進む。ステップ１０４において、図３に示す装置の特徴ベクトル生成ユニット３０２が、タググループに対応する特徴ベクトルを生成する。生成された特徴ベクトルのうち、各要素が複数の同義タグセットにおける異なる同義タグセットにそれぞれ対応し、各要素の値はタググループのうち、要素に対応する同義タグセットに属するタグの数である。 The method then proceeds to step 104. In step 104, the feature vector generation unit 302 of the apparatus shown in FIG. 3 generates a feature vector corresponding to the tag group. Of the generated feature vector, each element corresponds to a different synonymous tag set in a plurality of synonymous tag sets, and the value of each element is the number of tags belonging to the synonymous tag set corresponding to the element in the tag group. .

前記ステップ１０２で特定された結果により、タググループと対応する特徴ベクトルを生成することができる。例えば、タググループT1に対して、タググループT1における各タグの順による特定結果に応じて、タググループT1に対応する特徴ベクトルA：（S5:1，S1:1，S2:1，S4:1）を生成することができる。各要素のうち、コロンの前の部分は当該要素の対応する同義タグセットを示し、コロンの後の部分はタググループ１のうち当該要素に対応する同義タグセットに属するタグの数を示す。例えば、特徴ベクトルＡの一番目の要素「S5:1」について、「S5」は当該一番目の要素が同義タグセットS5に対応することを示し、「１」はタググループT1のうち同義タグセットS5に属するタグの数は１であることを示す。また、タググループT1に対して、前記同義タグセットS1から同義タグセットSnまでの順による特定結果に応じて、タググループT1に対応する特徴ベクトルA’：（S1:1，S2:1，S3:0，S4:1，S5:1，S6:0，S7:0，…,Sn:0）を生成することができる。なお、各要素の各部分の意味が前記特徴ベクトルAにおけるものと同じであるため、ここではその説明を省略する。同様な方法に従って、前記タググループT1からタググループTpまでのタググループのうちのそれぞれに対応する特徴ベクトルをそれぞれ生成することができる。 Based on the result specified in step 102, a feature vector corresponding to the tag group can be generated. For example, for the tag group T1, the feature vectors A: (S5: 1, S1: 1, S2: 1, S4: 1) corresponding to the tag group T1 according to the identification result in the order of each tag in the tag group T1. ) Can be generated. Of each element, the part before the colon indicates the synonymous tag set corresponding to the element, and the part after the colon indicates the number of tags belonging to the synonymous tag set corresponding to the element in the tag group 1. For example, for the first element “S5: 1” of the feature vector A, “S5” indicates that the first element corresponds to the synonymous tag set S5, and “1” indicates the synonymous tag set in the tag group T1. This indicates that the number of tags belonging to S5 is one. In addition, for the tag group T1, the feature vector A ′: (S1: 1, S2: 1, S3) corresponding to the tag group T1 is determined according to the specific result in the order from the synonymous tag set S1 to the synonymous tag set Sn. : 0, S4: 1, S5: 1, S6: 0, S7: 0, ..., Sn: 0). Since the meaning of each part of each element is the same as that in the feature vector A, description thereof is omitted here. According to a similar method, feature vectors corresponding to each of the tag groups from the tag group T1 to the tag group Tp can be generated.

次に、当該方法は、ステップ１０６に進む。ステップ１０６において、図３に示す装置の類似度算出ユニット３０４が、特徴ベクトルと、少なくとも一つのクラスのうちの各クラスのコア特徴ベクトルとの間の類似度を算出する。なお、クラスのコア特徴ベクトルの各要素の値は、既にクラスに分類された各タググループの対応する特徴ベクトルにおける相応する要素の値の和である。 The method then proceeds to step 106. In step 106, the similarity calculation unit 304 of the apparatus shown in FIG. 3 calculates the similarity between the feature vector and the core feature vector of each class of at least one class. The value of each element of the core feature vector of the class is the sum of the values of the corresponding elements in the corresponding feature vector of each tag group already classified into the class.

クラスは、互いに同一または類似した一つのグループのタググループにより構成されたセットである。すなわち、同一のクラスに属する各タグクループは、同一または類似している。例えば、タググループの間の余弦距離により、タググループが同一または類似するか否かを判断することが出来る。以下、タググループ間の余弦距離を算出する過程を説明する。 A class is a set composed of one group of tag groups that are the same or similar to each other. That is, each tag group belonging to the same class is the same or similar. For example, whether or not the tag groups are the same or similar can be determined based on the cosine distance between the tag groups. Hereinafter, a process of calculating a cosine distance between tag groups will be described.

上記ステップ１０４により、タググループT1に対応する特徴ベクトルAを生成し、且つタググループT2に対応する特徴ベクトルBを生成したことを仮定する。なお、特徴ベクトルAは（S1:fa1, S2:fa2, ・・・, Sn:fan）で示され、（fa1, fa2,・・・, fan）の略書きで記載可能であり、特徴ベクトルBは（S1:fb1, S2:fb2, ・・・, Sn:fbn）で示され、（fb1, fb2,・・・, fbn）の略書きで記載可能である。なお、Snは、特徴ベクトルAまたは特徴ベクトルBにおけるｎ番目の要素の対応する同義タグセットSnを示し、fanはタググループT1のうち特徴ベクトルAにおけるｎ番目の要素に対応する同義タグセットSnに属するタグの数を示し、fbnはタググループT2のうち特徴ベクトルBにおけるｎ番目の要素に対応する同義タグセットSnのタグの数を示す。以下の式（１）により、タググループT1に対応する特徴ベクトルAと、タググループT2に対応する特徴ベクトルBとの間の余弦類似度を算出することができる。即ち、
類似度（A，B）=(Σfak×fbk)/sqrt[(Σfak×fak) ×(Σfbk×fbk)] 式(1)
ただし、1≦k≦n、nは１以上の整数である。 It is assumed that the feature vector A corresponding to the tag group T1 is generated by the step 104 and the feature vector B corresponding to the tag group T2 is generated. The feature vector A is indicated by (S1: fa1, S2: fa2,..., Sn: fan), and can be described by an abbreviation of (fa1, fa2,..., Fan). Is represented by (S1: fb1, S2: fb2,..., Sn: fbn) and can be described by an abbreviation of (fb1, fb2,..., Fbn). Sn represents the synonymous tag set Sn corresponding to the nth element in the feature vector A or the feature vector B, and fan represents the synonymous tag set Sn corresponding to the nth element in the feature vector A in the tag group T1. The number of belonging tags is indicated, and fbn indicates the number of tags of the synonymous tag set Sn corresponding to the nth element in the feature vector B in the tag group T2. The cosine similarity between the feature vector A corresponding to the tag group T1 and the feature vector B corresponding to the tag group T2 can be calculated by the following equation (1). That is,
Similarity (A, B) = (Σfak × fbk) / sqrt [(Σfak × fak) × (Σfbk × fbk)] Equation (1)
However, 1 ≦ k ≦ n and n is an integer of 1 or more.

一つのグループのタググループにより構成されたクラスに対して、例えばクラスにおける各タググループの対応する各特徴ベクトルのうちの相応する要素を累計して、クラスの対応するコア特徴ベクトルを取得することができる。例えば、クラスＣには、既にクラスＣに分類されたタググループT1からタググループTm（ｍは、１以上の整数である）までのタググループを有し、かつ、タググループT1からタググループTmまでのタググループの対応する特徴ベクトルは、それぞれ特徴ベクトルA1から特徴ベクトルAmまでであることを仮定すると、クラスＣの対応するコア特徴ベクトルACは、以下の式（２）より示される。 For a class composed of one group of tag groups, for example, the corresponding elements of the corresponding feature vectors of each tag group in the class can be accumulated to obtain the corresponding core feature vector of the class. it can. For example, class C has tag groups from tag group T1 to tag group Tm (m is an integer greater than or equal to 1) already classified into class C, and from tag group T1 to tag group Tm. Assuming that the corresponding feature vectors of the tag groups are from the feature vector A1 to the feature vector Am, the corresponding core feature vector AC of class C is expressed by the following equation (2).

AC =（Σfaj1，Σfaj2，・・・，Σfajn）式（２）
ただし、1≦j≦m，mは１以上の整数である。 AC = (Σfaj1, Σfaj2, ..., Σfajn) Equation (2)
However, 1 ≦ j ≦ m, m is an integer of 1 or more.

式（２）によりクラスＣの対応するコア特徴ベクトルACを算出した後に、前記の式（１）を利用して新たなタググループTNEの対応する特徴ベクトルANEと、クラスＣの対応するコア特徴ベクトルACとの間の類似度を算出することができる。複数のクラスが存在すれば、新たなタググループTNEの対応する特徴ベクトルANEと、複数のクラスのそれぞれの対応するコア特徴ベクトルとの間の類似度をそれぞれ算出する。 After calculating the corresponding core feature vector AC of class C using equation (2), the corresponding feature vector ANE of the new tag group TNE and the corresponding core feature vector of class C using equation (1) above Similarity with AC can be calculated. If there are a plurality of classes, the degree of similarity between the corresponding feature vector ANE of the new tag group TNE and the corresponding core feature vector of each of the plurality of classes is calculated.

次に、当該方法は、ステップ１０８に進む。ステップ１０８において、図３に示す装置のタググループ分類ユニット３０６が、算出された類似度に基づいて、タググループを、少なくとも一つのクラスのうち近似するクラスに分類する。 The method then proceeds to step 108. In step 108, the tag group classification unit 306 of the apparatus shown in FIG. 3 classifies the tag group into an approximate class among at least one class based on the calculated similarity.

前記式（1）により算出されたタググループの対応する特徴ベクトルと、クラスの対応するコア特徴ベクトルとの間の余弦類似度の値の大きさは、タググループとクラスとの類似度を示し、且つ余弦類似度の値が大きいほど、タググループとクラスとが類似する。従って、算出された類似度により、タググループとクラスとが類似するか否かを判断し、タググループを近似する（即ち、類似する）クラスに分類することができる。 The magnitude of the value of the cosine similarity between the corresponding feature vector of the tag group calculated by the equation (1) and the corresponding core feature vector of the class indicates the similarity between the tag group and the class. As the cosine similarity value is larger, the tag group and the class are more similar. Therefore, it can be determined whether the tag group and the class are similar based on the calculated similarity, and the tag group can be classified into an approximate (ie, similar) class.

最後、当該方法は、ステップ１１０に進む。ステップ１１０において、当該方法が終了する。 Finally, the method proceeds to step 110. In step 110, the method ends.

以上のように、本発明の実施例による、タググループ対して分類を行う方法の全体の流れを説明した。以下、図２を参照しながら、前記のタググループに対して分類を行う方法の分類ステップの具体的な流れを説明する。図２は、本発明の一実施例による、タググループ対して分類を行う方法の分類ステップの具体的な流れを示すフローチャートである。 As described above, the overall flow of the method for classifying the tag groups according to the embodiment of the present invention has been described. Hereinafter, a specific flow of the classification step of the method for classifying the tag group will be described with reference to FIG. FIG. 2 is a flowchart illustrating a specific flow of a classification step of a method for performing classification on a tag group according to an embodiment of the present invention.

図２に示されたように、前記ステップ１０６により、タググループの対応する特徴ベクトルと、複数のクラスのそれぞれの対応するコア特徴ベクトルとの間の類似度をそれぞれ算出した後に、当該方法は、ステップ２００に進む。ステップ２００において、算出されたタググループの特徴ベクタルと、少なくとも一つのクラスのうちの各コア特徴ベクトルとの間の類似度を、所定の閾値と比較する。当該所定の閾値は、必要に応じて予め設定しても良く、実際の応用の過程中に必要に応じて調整しても良い。閾値の大きさを調整することによって、タググループに対して分類を行う精度を制御することができる。 As shown in FIG. 2, after calculating the similarity between the corresponding feature vector of the tag group and the corresponding core feature vector of each of the plurality of classes according to step 106, the method includes: Proceed to step 200. In step 200, the similarity between the calculated feature vector of the tag group and each core feature vector of at least one class is compared with a predetermined threshold. The predetermined threshold may be set in advance as necessary, or may be adjusted as necessary during the course of actual application. By adjusting the size of the threshold, it is possible to control the accuracy with which the tag group is classified.

ここで、タググループにより構成された、それぞれC1、C2及びC3で示される３つのクラスが存在すると仮定する。クラスC1、C2及びC3の対応するコア特徴ベクトルはそれぞれA1、A2及びA3である。新たなタググループTNEが発見されると、当該新たなタググループTNEの対応する特徴ベクトルをANEとして特定する。特徴ベクトルANEと、コア特徴ベクトルA1、A2及びA3との類似度をそれぞれ算出する。例えば、余弦の類似度を採用する場合は、算出された類似度の値はそれぞれ0.92、0.85及び0.79となる。前記類似度の値を算出した後に、前記類似度の値、即ち0.92、0.85及び0.79を、所定の閾値とそれぞれ比較する。 Here, it is assumed that there are three classes each constituted by a tag group and indicated by C1, C2, and C3. The corresponding core feature vectors of classes C1, C2, and C3 are A1, A2, and A3, respectively. When a new tag group TNE is found, the corresponding feature vector of the new tag group TNE is specified as ANE. The similarity between the feature vector ANE and the core feature vectors A1, A2, and A3 is calculated. For example, when the cosine similarity is adopted, the calculated similarity values are 0.92, 0.85, and 0.79, respectively. After calculating the similarity value, the similarity values, ie 0.92, 0.85 and 0.79, are respectively compared with a predetermined threshold.

次に、当該方法は、ステップ２０２に進む。ステップ２０２において、算出されたタググループと、少なくとも一つのクラスのうちの各クラスとの類似度が所定の閾値を超えるか否かを判断する。ステップ２０２での判断結果は「ＮＯ」、即ち、タググループがすべてのクラスのいずれにも類似しなければ、ステップ２０６に進む。ステップ２０６において、タググループを新たなクラスに分類することにより、当該新たなクラスに当該タググループを含ませる。 The method then proceeds to step 202. In step 202, it is determined whether the similarity between the calculated tag group and each class of at least one class exceeds a predetermined threshold value. If the determination result in step 202 is “NO”, that is, if the tag group is not similar to any of all classes, the process proceeds to step 206. In step 206, the tag group is included in the new class by classifying the tag group into a new class.

上記の例において、所定の閾値を0.93とする。算出された前記３つの類似度の値0.92、0.85及び0.79の何れも所定の閾値、即ち0.93を越えていないため、新たなタググループTNEと、現在のクラスC1、C2及びC3の何れも類似していない。このときに、新たなクラスC4を生成し、且つ新たなタググループTNEを新たなクラスC4に分類することによって、新たなC4に新たなタググループTNEを含ませることができる。 In the above example, the predetermined threshold is 0.93. Since none of the calculated three similarity values 0.92, 0.85, and 0.79 exceeds a predetermined threshold, that is, 0.93, the new tag group TNE is similar to all of the current classes C1, C2, and C3. Not. At this time, a new tag group TNE can be included in the new C4 by generating a new class C4 and classifying the new tag group TNE into the new class C4.

ステップ２０２での判断結果は「ＹＥＳ」であれば、ステップ２０４に進む。ステップ２０４において、所定の閾値より大きい類似度の対応するクラスが複数であるか否かを判断し、即ち、タググループと複数のクラスとの間の類似度は何れも所定の閾値より大きいか否かを判断する。ステップ２０４の判断結果が「ＮＯ」であれば、タググループとある一つのクラスとの間の類似度が所定の閾値より大きい、即ち、所定の閾値より大きい類似度の個数は１であることを意味し、ステップ２１０に進む。ステップ２１０において、タググループを算出された唯一の、所定の閾値を超えた類似度の対応するそのクラスに分類する。 If the determination result in step 202 is “YES”, the process proceeds to step 204. In step 204, it is determined whether or not there are a plurality of corresponding classes having a degree of similarity greater than a predetermined threshold, that is, whether or not the degrees of similarity between the tag group and the plurality of classes are all greater than a predetermined threshold. Determine whether. If the determination result in step 204 is “NO”, it is determined that the similarity between the tag group and a certain class is greater than a predetermined threshold, that is, the number of similarities greater than the predetermined threshold is 1. Mean, go to step 210. In step 210, the tag group is classified into its corresponding class with a calculated similarity that exceeds a predetermined threshold.

前記の例において、所定の閾値0.90とする。算出された前記３つの類似度の値0.92、0.85及び0.79のうち、類似度の値0.92のみが所定の閾値0.90を超えたため、新たなタググループTNEを、前記類似度の値0.92の対応するクラスC1に分類する。 In the above example, the predetermined threshold value is 0.90. Of the calculated three similarity values 0.92, 0.85, and 0.79, only the similarity value 0.92 exceeds a predetermined threshold value 0.90. Therefore, a new tag group TNE is assigned a class corresponding to the similarity value 0.92. Classify as C1.

ステップ２０４の判断結果は「ＹＥＳ」であれば、タググループと複数のクラスとの間の類似度が所定の閾値より大きい、即ち、所定の閾値より大きい類似度の個数が複数であることを意味し、ステップ２０８に進む。ステップ２０８において、所定の閾値より大きい複数の類似度のうち最大の類似度を選択し、タググループを、選択された最大の類似度の対応するそのクラスに分類する。 If the determination result in step 204 is “YES”, it means that the degree of similarity between the tag group and the plurality of classes is larger than a predetermined threshold, that is, the number of similarities larger than the predetermined threshold is plural. Then, the process proceeds to Step 208. In step 208, the maximum similarity among a plurality of similarities greater than a predetermined threshold is selected, and the tag group is classified into that class corresponding to the selected maximum similarity.

前記の例において、所定の閾値を0.80とする。算出された前記３つの類似度の値0.92、0.85及び0.79のうち、類似度の値0.92と0.85の両方も所定の閾値0.80を超えたため、所定の閾値0.80を超えた類似度の値0.92と0.85のうち、最大の類似度の値、即ち類似度の値0.92を選択する。その後、新たなタググループTNEを、前記最大の類似度の値0.92が対応するクラスC1に分類する。 In the above example, the predetermined threshold is 0.80. Among the calculated three similarity values 0.92, 0.85, and 0.79, both the similarity values 0.92 and 0.85 exceeded the predetermined threshold value 0.80. Therefore, the similarity values 0.92 and 0.85 exceeding the predetermined threshold value 0.80. Among them, the maximum similarity value, that is, the similarity value 0.92 is selected. Thereafter, the new tag group TNE is classified into the class C1 corresponding to the maximum similarity value 0.92.

ステップ２０６、２０８及び２１０の後に、ステップ２１２に進み。ステップ２１２において、当該方法は終了する。 After steps 206, 208 and 210, go to step 212. In step 212, the method ends.

前記の説明において、余弦類似度を利用してタググループとタググループとの間の類似度、及びタググループと、タググループからなるクラスとの間の類似度を算出した。しかしながら、タググループとタググループとの間の類似度、または、タググループと、タググループからなるクラスとの間の類似度を算出可能であれば、その他の類似度の算出方法を採用しても良いことが、当業者にとって理解すべきである。 In the above description, the cosine similarity is used to calculate the similarity between the tag group and the tag group, and the similarity between the tag group and the class composed of the tag group. However, as long as the similarity between the tag groups or the tag group and the class composed of the tag groups can be calculated, other similarity calculation methods may be adopted. It should be appreciated by those skilled in the art.

前記の説明において、クラスに含まれたタググループの数は動的に増加したものである。前記のタググループに対して分類を行う方法により、タググループがあるクラスに分類された後に、当該クラスに含まれたタググループの数は１増加することになる。好ましくは、新たなタググループをあるクラスに分類した後に、当該新たなタググループ及びその前に当該クラスに既に含まれたすべてのタググループに基づいて、前記式（２）を利用して当該クラスの対応するコア特徴ベクトルを新たに算出し、新たに算出されたコア特徴ベクトルを当該クラスの対応する新たなコア特徴ベクトルとする。その後、別のタググループに対して分類を行うときに、当該別のタググループと当該クラスの新たなコア特徴ベクトルとの類似度を比較する。従って、本実施例の方法によれば、各種のタググループの多種の特徴を総合的に考慮することで、より正確に、より効率にタググループ間の同一又は類似を判断することができる。 In the above description, the number of tag groups included in a class is dynamically increased. After the tag group is classified into a certain class by the method for classifying the tag group, the number of tag groups included in the class is increased by one. Preferably, after classifying a new tag group into a certain class, based on the new tag group and all tag groups already included in the class before that, the class using the formula (2) is used. Are newly calculated, and the newly calculated core feature vector is set as a new core feature vector corresponding to the class. Then, when classifying another tag group, the similarity between the another tag group and the new core feature vector of the class is compared. Therefore, according to the method of the present embodiment, it is possible to determine the same or similar between tag groups more accurately and more efficiently by comprehensively considering various features of various tag groups.

以下、図３を参照しながら、本発明の一実施例による、タググループに対して分類を行う装置を説明する。図３は、本発明の一実施例による、タググループに対して分類を行う装置を示すブロック図である。 Hereinafter, an apparatus for classifying tag groups according to an embodiment of the present invention will be described with reference to FIG. FIG. 3 is a block diagram illustrating an apparatus for classifying tag groups according to an embodiment of the present invention.

図３に示されたように、タググループに対して分類を行う装置３１２は、主に、同義タグセット特定ユニット３００と、特徴ベクトル生成ユニット３０２と、類似度算出ユニット３０４と、タググループ分類ユニット３０６とを含む。同義タグセット特定ユニット３００は、同義タグセットデータベース３０８に記憶された複数の同義タグセットに基づいて、入力されたタググループにおける各タグの所属する同義タグセットを特定する。特徴ベクトル生成ユニット３０２は、入力されたタググループに対応する特徴ベクトルを生成する。生成された特徴ベクトルにおいて、各要素はそれぞれ複数の同義タグセットにおける異なる同義タグセットに対応し、各要素の値はタググループのうち要素に対応する同義タグセットに属するタグの数である。類似度算出ユニット３０４は、特徴ベクトルとクラスセットデータベース３１０に記憶された少なくとも一つのクラスのうちの各クラスのコア特徴ベクトルとの間の類似度を算出する。ここで、クラスのコア特徴ベクトルの各要素の値は、既にクラスに分類された各タググループの対応するベクトルにおける相応する要素の値の和である。タググループ分類ユニット３０６は、算出された類似度に基づいて、入力されたタググループを、クラスセットデータベース３１０に記憶された少なくとも一つのクラスのうちの近似するクラスに分類する。 As shown in FIG. 3, the apparatus 312 for classifying tag groups mainly includes a synonym tag set specifying unit 300, a feature vector generating unit 302, a similarity calculating unit 304, and a tag group classifying unit. 306. The synonym tag set specifying unit 300 specifies a synonym tag set to which each tag in the input tag group belongs based on a plurality of synonym tag sets stored in the synonym tag set database 308. The feature vector generation unit 302 generates a feature vector corresponding to the input tag group. In the generated feature vector, each element corresponds to a different synonymous tag set in a plurality of synonymous tag sets, and the value of each element is the number of tags belonging to the synonymous tag set corresponding to the element in the tag group. The similarity calculation unit 304 calculates the similarity between the feature vector and the core feature vector of each class of at least one class stored in the class set database 310. Here, the value of each element of the core feature vector of the class is the sum of the values of the corresponding elements in the corresponding vector of each tag group already classified into the class. The tag group classification unit 306 classifies the input tag group into an approximate class among at least one class stored in the class set database 310 based on the calculated similarity.

タググループ分類ユニット３０６は、クラス特定ユニット３０６２を含む。クラス特定ユニット３０６２は、算出されたタググループと少なくとも一つのクラスのうちの各クラスとの間の類似度が所定の閾値を超えているか否かに基づいて、少なくとも一つのクラスのうちの各クラスが前記近似するクラスであるか否かを特定する。少なくとも一つのクラスのうち前記近似するクラスが存在しなければ、クラス特定ユニット３０６２が、前記タググループを新たなクラスに分類する。近似するクラスが複数あれば、クラス特定ユニット３０６２は、タググループを、算出された最大類似度の対応するクラスに分類する。 The tag group classification unit 306 includes a class specifying unit 3062. The class identification unit 3062 determines whether each class of at least one class is based on whether the similarity between the calculated tag group and each class of the at least one class exceeds a predetermined threshold. Specifies whether or not is an approximate class. If there is no approximate class among at least one class, the class specifying unit 3062 classifies the tag group into a new class. If there are a plurality of approximate classes, the class specifying unit 3062 classifies the tag group into a class corresponding to the calculated maximum similarity.

同義タグセット辞書のような他の方式で前記の複数の同義タグセットを提供しても良く、または、他の方式で前記クラスを提供しても良いことは、当業者にとって理解すべきである。同義タグセットデータベース３０８とクラスセットデータベース３１０とは記憶ユニット３１４に記憶される。記憶ユニット３１４は、例えば、磁気ディスク、フラッシュメモリ、モバイルメモリなどである。記憶ユニット３１４は、前記のタググループに対して分類を行う装置３１２に備えられても良く、或は、前記のタググループに対して分類を行う装置３１２の外部に位置され、かつ有線または無線の手段で前記のタググループに対して分類を行う装置３１２に付加されても良い。 It should be understood by those skilled in the art that the plurality of synonymous tag sets may be provided in other ways, such as a synonym tag set dictionary, or the class may be provided in other ways. . The synonym tag set database 308 and the class set database 310 are stored in the storage unit 314. The storage unit 314 is, for example, a magnetic disk, flash memory, mobile memory, or the like. The storage unit 314 may be provided in the device 312 that performs classification on the tag group, or may be located outside the device 312 that performs classification on the tag group and may be wired or wireless. Means may be added to the device 312 for classifying the tag group.

余弦類似度を利用して、タググループとタググループとの間の類似度、及びタググループとタググループからなるクラスとの間の類似度を算出することができる。しかしながら、タググループとタググループとの間の類似度、またはタググループとタググループからなるクラスとの間の類似度を算出可能であれば、他の類似度算出方法を採用しても良いことは、当業者にとって理解すべきである。 Using the cosine similarity, the similarity between the tag group and the tag group, and the similarity between the tag group and the class comprising the tag group can be calculated. However, if the similarity between the tag group and the tag group or the similarity between the tag group and the class comprising the tag group can be calculated, other similarity calculation methods may be adopted. Should be understood by those skilled in the art.

前記のタググループに対して分類を行う装置３１２は、実際に、前記のタググループに対して分類を行う方法に対応する装置である。したがって、ここでは、その詳細な説明を省略する。 The apparatus 312 for classifying the tag group is an apparatus corresponding to a method for actually classifying the tag group. Therefore, detailed description thereof is omitted here.

以下、図４を参照しながらタググループに基づいてデータをマッシュアップする方法を説明する。図４は、タググループに基づいてデータをマッシュアップする方法を示すフローチャートである。なお、当該方法は、図５に示す装置において行われる。 Hereinafter, a method for mashing up data based on tag groups will be described with reference to FIG. FIG. 4 is a flowchart illustrating a method for mashing up data based on tag groups. This method is performed in the apparatus shown in FIG.

図４に示されたように、当該方法は、ステップ４００からスタートする。次に、当該方法は、ステップ４０２に進む。ステップ４０２において、図５に示す装置の分類ユニット５０３が、前記のタググループに対して分類を行う方法を使用して、タググループを少なくとも一つのクラスに分類する。したがって、前記のタググループに対して分類を行う方法を使用することで、異なるデータフォーマット仕様に適うタググループ、またはユーザによりカスタマイズされた異なるタググループなどを、それらの間の類似度に従って、動的に異なるクラスに区分し、かつ各クラスにおけるタググループが互いに類似している。 As shown in FIG. 4, the method starts at step 400. The method then proceeds to step 402. In step 402, the classification unit 503 of the apparatus shown in FIG. 5 classifies the tag group into at least one class using the method for classifying the tag group. Therefore, by using the above-described classification method for tag groups, tag groups that meet different data format specifications, or different tag groups customized by the user, etc. can be dynamically changed according to the similarity between them. The tag groups in each class are similar to each other.

次に、当該方法はステップ４０４に進む。ステップ４０４において、図５に示す装置の置換ユニット５０５が、同一のクラスにおける各タググループの各タグのそれぞれを、その属する同義タグセットにおいて指定されたタグで置換する。前記ステップ４０２でタググループが異なるクラスに区分された後に、同一のクラスにおける各タググループの各タグのそれぞれを統一的なタグで置換することができる。これにより、同一のクラスにおいて類似している各タグを同一のタググループに統一し、取得された同一のタググループを使用して、その前に各類似しているタググループで記述したデータを新たに記述することができ、類似する内容・意味を有するデータのマッシュアップが実現される。 The method then proceeds to step 404. In step 404, the replacement unit 505 of the apparatus shown in FIG. 5 replaces each tag of each tag group in the same class with the tag specified in the synonymous tag set to which it belongs. After the tag group is divided into different classes in step 402, each tag of each tag group in the same class can be replaced with a uniform tag. This unifies each tag that is similar in the same class into the same tag group, and uses the same tag group that has been acquired, and the data previously described in each similar tag group is updated. Mashup of data having similar contents and meaning can be realized.

各種の方法で前記の同一のクラスにおける各タググループの各タグの置換操作を行うことができる。例えば、同一のクラスにおける各タググループの各タグを、その所属する同義タグセットにおいて指定されたタグで置換ことができる。前記の指定されたタグは、例えば、同一のクラスにおける各タググループの各タグの所属する同義タグセットにおける一番目のタグまたは最後のタグであってもよい。或は、例えば、同一のクラスにおけるすべてのタググループに対して、同一のクラスにおける各タググループの各タグの所属する同義タグセットにおける各同義タグの使用頻度を統計し、使用頻度の最も高い同義タグを前記指定されたタグとしても良い。置換後の指定されたタグが相応するデータを統一的に定義できることを確保できれば、他の方式で前記同一のクラスにおける各タググループの各タグの置換操作を行っても良いことは、当業者にとって理解すべきである。 The replacement operation of each tag of each tag group in the same class can be performed by various methods. For example, each tag of each tag group in the same class can be replaced with a tag specified in the synonymous tag set to which the tag belongs. The designated tag may be, for example, the first tag or the last tag in the synonymous tag set to which each tag of each tag group in the same class belongs. Or, for example, for all tag groups in the same class, the usage frequency of each synonym tag in the synonym tag set to which each tag of each tag group in the same class belongs is statistically analyzed, and the synonym having the highest usage frequency is calculated. The tag may be the designated tag. For those skilled in the art, replacement of each tag of each tag group in the same class may be performed by other methods as long as it can be ensured that the specified tags after replacement can uniformly define corresponding data. Should be understood.

次に、当該方法は、ステップ４０４に進む。ステップ４０４において、当該方法が終了する。 The method then proceeds to step 404. In step 404, the method ends.

図５を参照しながら、タググループに基づいてデータをマッシュアップする装置を説明する。図５は、タググループに基づいてデータをマッシュアップする装置を示すブロック図である。 An apparatus for mashing up data based on tag groups will be described with reference to FIG. FIG. 5 is a block diagram illustrating an apparatus for mashing up data based on tag groups.

図５に示されたように、タググループに基づいてデータをマッシュアップする装置５０１は、主に、分類ユニット５０３と、置換ユニット５０５とを含む。分類ユニット５０３は、前記タググループに対して分類を行う装置を使用して、入力されたデータにおけるタググループを、少なくとも一つのクラスに分類する。置換ユニット５０５は、同一のクラスにおける各タググループの各タグを、それぞれその所属する同義タグセットにおいて指定されたタグで置換する。これにより、同一のクラスにおける類似する各タグを同一のタググループに統一し、且つ取得された同一のタググループを使用して入力されたデータを新たに記述することができ、類似する内容意味を有するデータのマッシュアップが実現される。 As shown in FIG. 5, the apparatus 501 for mashing up data based on tag groups mainly includes a classification unit 503 and a replacement unit 505. The classification unit 503 classifies the tag group in the input data into at least one class using an apparatus that classifies the tag group. The replacement unit 505 replaces each tag of each tag group in the same class with a tag specified in the synonymous tag set to which the tag group belongs. Thereby, it is possible to unify similar tags in the same class into the same tag group, and to newly describe the input data using the acquired same tag group, The mashup of the data which it has is implement | achieved.

前記のタググループに基づいてデータをマッシュアップする装置５０１は、実際に、前記のタググループに基づいてデータをマッシュアップする方法に対応する装置である。従って、ここではその詳細な説明を省略する。 The device 501 for mashing data based on the tag group is actually a device corresponding to the method for mashing data based on the tag group. Therefore, detailed description thereof is omitted here.

図６は、本発明の装置及び方法を実現するコンピュータの例示的な構造を示すブロック図である。 FIG. 6 is a block diagram illustrating an exemplary structure of a computer that implements the apparatus and method of the present invention.

図６において、中央処理ユニット(ＣＰＵ)６０１は、リードオンリメモリ(ＲＯＭ)６０２に記憶されたプログラムまたは記憶部６０８からランダムアクセスメモリ(ＲＡＭ)６０３にロードしたプログラムに基づいて、各種の処理を実行する。ＲＡＭ６０３において、必要に応じて、ＣＰＵ６０１が各種の処理等を実行するときに必要なデータも記憶される。 In FIG. 6, a central processing unit (CPU) 601 executes various processes based on a program stored in a read only memory (ROM) 602 or a program loaded from a storage unit 608 to a random access memory (RAM) 603. To do. In the RAM 603, data necessary when the CPU 601 executes various processes and the like is also stored as necessary.

ＣＰＵ６０１、ＲＯＭ６０２及びＲＡＭ６０３はバス６０４を介して互いに接続する。入力／出力インタフェース６０５もバス６０４に接続される。 The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input / output interface 605 is also connected to the bus 604.

キーボード、マウス等を含む入力部６０６と、ブラウン管（CRT）、液晶ディスプレイ（LCD）等のようなディスプレイとスピーカ等を含む出力部６０７と、ハードディスク等を含む記憶部６０８と、LANカード、モデム等のようなネットワークインターフェースカードを含む通信部６０９とは、入力／出力インタフェース６０５に接続されている。通信部６０９はネットワーク、例えばインターネットを経由して通信処理を実行する。 An input unit 606 including a keyboard and a mouse, an output unit 607 including a display such as a cathode ray tube (CRT) and a liquid crystal display (LCD) and a speaker, a storage unit 608 including a hard disk, a LAN card, a modem, and the like The communication unit 609 including the network interface card is connected to the input / output interface 605. The communication unit 609 executes communication processing via a network, for example, the Internet.

必要に応じて、入力／出力インタフェース６０５にはドライブ６１０も接続されている。磁気ディスク、光ディスク、光磁気ディスク、半導体メモリ等のような取り外し可能な媒体６１１は、必要に応じてドライブ６１０に取り付けられており、その中から読み出されたコンピュータプログラムが必要に応じて記憶部６０８にインストールされる。 A drive 610 is also connected to the input / output interface 605 as necessary. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is attached to the drive 610 as necessary, and a computer program read from the medium 611 is stored as necessary. 608 is installed.

ソフトウェアにより前記ステップ及び処理が実現される場合には、ネットワーク例えばインターネット、または記憶媒体例えば取り外し可能な媒体６１１から、ソフトウェアを構成するプログラムをインストールする。 When the steps and processes are realized by software, a program constituting the software is installed from a network such as the Internet or a storage medium such as a removable medium 611.

このような記憶媒体は、図６に示されたような、プログラムが記憶されており、方法と別に配布されることでユーザにプログラムを提供する取り外し可能な媒体６１１に限定されないことが、当業者にとって理解すべきである。取り外し可能な媒体６１１の例として、磁気ディスク、光ディスク（コンパクトディスクリードオンリメモリ（ＣＤ−ＲＯＭ）やディジタルヴァーサタイルディスク（ＤＶＤ）を含む）、光磁気ディスク（ミニディスク(ＭＤ)含む）及び半導体メモリを含む。または、記憶媒体は、ＲＯＭ６０２、記憶部６０８に含まれるハードディスクなどであっても良い。その中にプログラムが記憶されており、ユーザに配布される。 Such a storage medium is not limited to a removable medium 611 in which a program is stored as shown in FIG. 6 and provided to the user by being distributed separately from the method. Should be understood. Examples of the removable medium 611 include a magnetic disk, an optical disk (including a compact disk read-only memory (CD-ROM) and a digital versatile disk (DVD)), a magneto-optical disk (including a mini disk (MD)), and a semiconductor memory. including. Alternatively, the storage medium may be a ROM 602, a hard disk included in the storage unit 608, or the like. The program is stored in it and distributed to users.

以上、本発明の好ましい実施例を説明したが、本発明はこの実施例に限定されず、本発明の趣旨を離脱しない限り、本発明に対するあらゆる変更は本発明の技術的範囲に属する。 The preferred embodiment of the present invention has been described above, but the present invention is not limited to this embodiment, and all modifications to the present invention belong to the technical scope of the present invention unless departing from the spirit of the present invention.

以上の実施例を含む実施形態に関し、更に以下の付記を開示する。 The following additional notes are further disclosed with respect to the embodiment including the above examples.

（付記１）コンピュータが、少なくとも一つのタグと、前記少なくとも一つのタグにより定義された相応するデータとを含むタググループに対して分類を行う方法であって、前記コンピュータが、同義のタグが属する同義タグセット群と、1つのデータリストのデータを定義するタグが属するタググループ群とから、あるグループに属するタグが、いずれの同義タグにいくつ現れるかを示す要素群を生成し、前記要素群から、各タググループに対応する特徴ベクトルを生成し、前記タググループを、各タググループの特徴ベクトルの類似度に応じてクラスに分類することを特徴とする分類を行う方法。 (Supplementary note 1) A method in which a computer classifies a tag group including at least one tag and corresponding data defined by the at least one tag, and the computer belongs to a synonymous tag From the synonymous tag set group and the tag group group to which the tag defining the data of one data list belongs, an element group indicating how many tags belonging to a certain group appear in which synonymous tag is generated, and the element group A feature vector corresponding to each tag group is generated, and the tag group is classified into classes according to the similarity of the feature vectors of each tag group.

（付記２）前記コンピュータが、各クラスについて、クラスに分類されたタググループの特徴ベクタルの要素の値の和となるコア特徴ベクトルを算出し、分類されるべきタググループの特徴ベクトルと、分類先となる各クラスのコア特徴ベクトルとの類似度を求め、何れのコア特徴ベクトルとも類似しないと判断すると、新たなクラスを作成して該分類されるべきタググループを該作成した新たなクラスに分類することを特徴とする付記１に記載の分類を行う方法。 (Additional remark 2) The said computer calculates the core feature vector used as the sum of the value of the element of the feature vector of the tag group classified into the class about each class, the feature vector of the tag group which should be classified, and classification | category destination When the similarity with the core feature vector of each class is obtained and it is determined that it is not similar to any core feature vector, a new class is created and the tag group to be classified is classified into the created new class A method for performing classification according to appendix 1, wherein:

（付記３）少なくとも一つのタグと、前記少なくとも一つのタグにより定義された相応するデータとを含むタググループに対して分類を行う装置であって、同義のタグが属する同義タグセット群と、1つのデータリストのデータを定義するタグが属するタググループ群とから、あるグループに属するタグが、いずれの同義タグにいくつ現れるかを示す要素群を生成する第一のユニットと、前記要素群から、各タググループに対応する特徴ベクトルを生成する第二のユニットと、前記タググループを、各タググループの特徴ベクトルの類似度に応じてクラスに分類する第三のユニットと、を含む、ことを特徴とする分類を行う装置。 (Appendix 3) A device for classifying a tag group including at least one tag and corresponding data defined by the at least one tag, the synonymous tag set group to which the synonymous tag belongs, From the tag group group to which the tags defining the data of one data list belong, a first unit for generating an element group indicating how many tags belonging to a certain group appear in which synonymous tag, and the element group, A second unit that generates a feature vector corresponding to each tag group; and a third unit that classifies the tag group into classes according to the similarity of the feature vectors of each tag group. A device that performs the classification.

（付記４）各クラスについて、クラスに分類されたタググループの特徴ベクタルの要素の値の和となるコア特徴ベクトルを算出する第四のユニットと、分類されるべきタググループの特徴ベクトルと、分類先となる各クラスのコア特徴ベクトルとの類似度を求め、何れのコア特徴ベクトルとも類似しないと判断すると、新たなクラスを作成して該分類されるべきタググループを該作成した新たなクラスに分類する第五のユニットと、を含む、ことを特徴とする付記３に記載の分類を行う装置。 (Supplementary Note 4) For each class, a fourth unit that calculates a core feature vector that is the sum of the element vector element values of the tag group classified into the class, a feature vector of the tag group to be classified, and a classification When the similarity with the core feature vector of each previous class is obtained and it is determined that it is not similar to any of the core feature vectors, a new class is created and the tag group to be classified is changed to the created new class. An apparatus for performing classification according to supplementary note 3, including a fifth unit for classification.

（付記５）コンピュータが、タググループに基づいてデータをマッシュアップする方法であって、前記コンピュータが、付記１又は２に記載の分類を行う方法で、タググループを少なくとも一つのクラスに分類し、同一のクラスにおける各タググループの各タグのそれぞれを、その所属する同義タグセットにおいて指定されたタグで置換することを特徴とするデータをマッシュアップする方法。 (Supplementary Note 5) A method in which a computer mashes data based on a tag group, and the computer classifies the tag group into at least one class by the method described in Supplementary Note 1 or 2, A method of mashing up data, wherein each tag of each tag group in the same class is replaced with a tag specified in the synonymous tag set to which the tag belongs.

（付記６）タググループに基づいてデータをマッシュアップする装置であって、付記３又は４に記載の分類を行う装置で、タググループを少なくとも一つのクラスに分類する分類ユニットと、同一のクラスにおける各タググループの各タグのそれぞれを、その所属する同義タグセットにおいて指定されたタグで置換する置換ユニットと、を含む、ことを特徴とするデータをマッシュアップする装置。 (Appendix 6) A device for mashing up data based on a tag group, the device for performing classification according to appendix 3 or 4, and a classification unit for classifying the tag group into at least one class, and in the same class An apparatus for mashing up data, comprising: a replacement unit that replaces each tag of each tag group with a tag specified in a synonymous tag set to which the tag group belongs.

（付記７）少なくとも一つのタグと、前記少なくとも一つのタグにより定義された相応するデータとを含むタググループに対して分類を行うプログラムであって、コンピュータに、同義のタグが属する同義タグセット群と、1つのデータリストのデータを定義するタグが属するタググループ群とから、あるグループに属するタグが、いずれの同義タグにいくつ現れるかを示す要素群を生成し、前記要素群から、各タググループに対応する特徴ベクトルを生成し、前記タググループを、各タググループの特徴ベクトルの類似度に応じてクラスに分類することを実行させるためのプログラム。 (Supplementary note 7) A program for classifying a tag group including at least one tag and corresponding data defined by the at least one tag, the synonymous tag set group to which the synonymous tag belongs. And a tag group group to which tags defining data of one data list belong, generate an element group indicating how many tags belonging to a certain group appear in which synonymous tag, and from the element group, each tag A program for generating a feature vector corresponding to a group and classifying the tag group into classes according to the similarity of the feature vectors of each tag group.

（付記８）付記７に記載のプログラムを記憶しているコンピュータ読み出し可能な記憶媒体。 (Supplementary note 8) A computer-readable storage medium storing the program according to supplementary note 7.

Claims

A method wherein a computer classifies a tag group including at least one tag and corresponding data defined by the at least one tag,
The computer is
Generates an element group indicating how many tags belonging to a group appear in which synonym tag group from a synonym tag set group to which synonymous tags belong and a tag group group to which tags defining data of one data list belong And
Generating a feature vector corresponding to each tag group from the element group;
Classifying the tag groups into classes according to the similarity of the feature vectors of each tag group;
Here, in the generated feature vector, each element corresponds to a different synonym tag set in the synonym tag set group, and the value of each element corresponds to the synonym tag corresponding to the element in the corresponding tag group. A classification method characterized by the number of tags belonging to a set .

The computer is
For each class, calculate a core feature vector that is the sum of the element vector element values of the tag groups classified into classes,
The similarity between the feature vector of the tag group to be classified and the core feature vector of each class to be classified is obtained, and if it is determined that it is not similar to any core feature vector, a new class is created and the classification is performed. Classify the tag group to be created into the newly created class,
The method according to claim 1, wherein the classification is performed.

An apparatus for classifying a tag group including at least one tag and corresponding data defined by the at least one tag,
Generates an element group indicating how many tags belonging to a group appear in which synonym tag group from a synonym tag set group to which synonymous tags belong and a tag group group to which tags defining data of one data list belong The first unit to
A second unit for generating a feature vector corresponding to each tag group from the element group;
A third unit that classifies the tag groups into classes according to the similarity of the feature vectors of each tag group;
Only including,
Here, in the generated feature vector, each element corresponds to a different synonym tag set in the synonym tag set group, and the value of each element corresponds to the synonym tag corresponding to the element in the corresponding tag group. A device that performs classification characterized by the number of tags belonging to a set .

For each class, a fourth unit for calculating a core feature vector that is the sum of the element vector element values of the tag groups classified into classes,
The similarity between the feature vector of the tag group to be classified and the core feature vector of each class to be classified is obtained, and if it is determined that it is not similar to any core feature vector, a new class is created and the classification is performed. A fifth unit for classifying the tag group to be assigned into the created new class;
The apparatus for performing classification according to claim 3, comprising:

A computer mashup of data based on tag groups,
The computer is
The method for performing classification according to claim 1 or 2, wherein the tag group is classified into at least one class,
Replace each tag in each tag group in the same class with the tag specified in the synonymous tag set to which it belongs,
A method of mashing up data characterized by that.

A device that mashes up data based on tag groups,
A classification unit according to claim 3 or 4, wherein the classification unit classifies the tag group into at least one class;
A replacement unit that replaces each tag of each tag group in the same class with a tag specified in the synonymous tag set to which the tag belongs;
A device for mashing up data characterized by including:

A program for causing a computer to execute the classification method according to claim 1 or 2 .

A computer-readable storage medium storing the program according to claim 7.