JP2014134895A

JP2014134895A - Program, terminal device, and data processing method

Info

Publication number: JP2014134895A
Application number: JP2013001317A
Authority: JP
Inventors: Hiromi Ishisaki; 広海石先; Hajime Hattori; 元服部; Toshihiro Ono; 智弘小野
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2013-01-08
Filing date: 2013-01-08
Publication date: 2014-07-24

Abstract

PROBLEM TO BE SOLVED: To show to a user what kind of communication is performed in a communication system such as an SNS.SOLUTION: A program of the present invention that visualizes the feature of data comprises the steps of: obtaining data labeled on the basis of at least one variable; generating a feature vector by using the variable of the obtained data as a parameter; mapping the generated feature vector in a multi-dimensional space; and displaying the mapped feature vector on a screen.

Description

本発明は、データの特徴を可視化する技術に関し、特に、ネットワーク上のコミュニケーションにおいて送信または受信されるデータの特徴を可視化する技術に関する。 The present invention relates to a technique for visualizing data characteristics, and more particularly to a technique for visualizing data characteristics transmitted or received in communication on a network.

従来から、コミュニケーションを可視化する技術が提案されている。例えば、特許文献１記載の技術では、組織内での生産性向上のため、実際に対面コミュニケーションを観察し、組織に属する個人のコミュニケーションスタイル、組織のコミュニケーションスタイル組織に含まれる会組織のコミュニケーションスタイルを可視化している。具体的には、センサから収集したインタラクションデータに基づいて、個人のコミュニケーションスタイルを２次元マップにプロットすることでコミュニケーションの可視化を実現している。 Conventionally, techniques for visualizing communication have been proposed. For example, in the technique described in Patent Document 1, in order to improve productivity within the organization, the face-to-face communication is actually observed, and the communication style of the personal organization belonging to the organization, the communication style of the association organization included in the organization communication style organization Visualized. Specifically, based on interaction data collected from sensors, visualization of communication is realized by plotting individual communication styles on a two-dimensional map.

また、特許文献２では、会話の返答に使用される語句の同意または非同意の強度を用いて、コミュニケーション相手との価値観の相違、およびその会話トピックに対する肯定度を推定し、肯定度をアイコンの表示属性に変換することで可視化を実現している。 Moreover, in patent document 2, the difference of values with a communication partner and the affirmation degree for the conversation topic are estimated by using the strength of agreement or disagreement of a phrase used for replying a conversation, Visualization is realized by converting to display attributes.

また、特許文献３では、組織内もしくは組織間における電子メールのログや会議の記録などの複数の手段によって行なわれたコミュニケーションを記録し、情報取り込み時間という共通指標に統合してコミュニケーションの可視化を実現する技術が開示されている。 Also, in Patent Document 3, communication performed by a plurality of means such as e-mail logs and conference recordings within or between organizations is recorded, and integrated into a common index called information capture time to realize communication visualization Techniques to do this are disclosed.

また、非特許文献１では、携帯電話によるコミュニケーションを送受信履歴から分析し、人間関係をネットワークとして可視化する技術が開示されている。 Non-Patent Document 1 discloses a technique for analyzing communication by a mobile phone from a transmission / reception history and visualizing human relationships as a network.

特開２０１２−１２８８８２号公報JP 2012-128882 A 特開２００６−２０９３３２号公報JP 2006-209332 A 特開２００６‐１２７１４２号公報JP 2006-127142 A

イーグルＮ、ペントランドＡ、「リアリティ・マイニング：センシング・コンプレックス・ソーシャル・システムズ」、Ｊ・オブ・パーソナル・アンド・ユビキタス・コンンピューティング、２００５年７月（Eagle, N., and Pentland, A., "Reality Mining: Sensing Complex Social Systems", J. of Personal and Ubiquitous Computing, July 2005）Eagle N, Pentland A, “Reality Mining: Sensing Complex Social Systems”, J of Personal and Ubiquitous Computing, July 2005 (Eagle, N., and Pentland, A ., "Reality Mining: Sensing Complex Social Systems", J. of Personal and Ubiquitous Computing, July 2005)

しかしながら、特許文献１記載の技術では、実際にコミュニケーションが観測できる状態であることが前提となっているため、ＳＮＳやコミュニケーションツール上で発生したコミュニケーション特徴を可視化する目的に適合させることは容易ではない。 However, since the technology described in Patent Document 1 is based on the premise that communication is actually observable, it is not easy to adapt it to the purpose of visualizing communication features generated on SNS and communication tools. .

また、特許文献２記載の技術では、ユーザ間のコミュニケーションインタラクションにおける価値観の差異評価に特化しているため、ＳＮＳなどのコミュニケーション特徴を可視化することには適していない。 Moreover, since the technique described in Patent Document 2 specializes in evaluating differences in values in communication interaction between users, it is not suitable for visualizing communication characteristics such as SNS.

また、特許文献３記載の技術では、コミュニケーションの時間的な推移に基づいて組織コミュニケーションの推移を表示し、診断しているが、コミュニケーションの質・状況等、詳細なパラメータを含む分析に適応することができない。さらに、複数の視点での特徴を同時に扱うことができない。この点は、非特許文献１についても同様である。 Moreover, in the technique described in Patent Document 3, the transition of organizational communication is displayed and diagnosed based on the temporal transition of communication, but it is applicable to analysis including detailed parameters such as communication quality and status. I can't. Furthermore, features from multiple viewpoints cannot be handled simultaneously. This also applies to Non-Patent Document 1.

従来は、ＳＮＳにおいて、どのような機能がどのように利用されているのかを可視化するシステムが提案されていないため、ユーザは自分に合ったＳＮＳを事前に推し量ることが容易ではなかった。 Conventionally, since a system for visualizing what functions are used in SNS has not been proposed, it has not been easy for a user to guess in advance an SNS suitable for him.

本発明は、このような事情に鑑みてなされたものであり、ＳＮＳなどのコミュニケーションシステムにおいて、どのようなコミュニケーションが行なわれているかをユーザに示すことができるプログラム、端末装置およびデータ処理方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and provides a program, a terminal device, and a data processing method that can indicate to a user what communication is being performed in a communication system such as SNS. The purpose is to do.

（１）上記の目的を達成するために、本発明は、以下のような手段を講じた。すなわち、本発明のプログラムは、データの特徴を可視化するプログラムであって、少なくとも一つの変数に基づいてラベリングされたデータを取得する処理と、前記取得したデータの変数をパラメータとして特徴ベクトルを生成する処理と、前記生成した特徴ベクトルを多次元空間にマッピングする処理と、前記マッピングされた特徴ベクトルを画面に表示する処理と、の一連の処理を、コンピュータに実行させることを特徴とする。 (1) In order to achieve the above object, the present invention takes the following measures. That is, the program of the present invention is a program for visualizing data characteristics, and generates a feature vector using a process for acquiring labeled data based on at least one variable and using the acquired data variable as a parameter. The computer is caused to execute a series of processes including a process, a process of mapping the generated feature vector in a multidimensional space, and a process of displaying the mapped feature vector on a screen.

このように、少なくとも一つの変数に基づいてラベリングされたデータの変数をパラメータとして特徴ベクトルを生成し、生成した特徴ベクトルを多次元空間にマッピングし、マッピングされた特徴ベクトルを画面に表示するので、データがどのような特徴を有しているのかを可視化することができる。これにより、ユーザは、ＳＮＳやコミュニケーションサービスでどのような特徴を有するデータが取り扱われているのかを視認することが可能となる。 In this way, a feature vector is generated using a variable of data labeled based on at least one variable as a parameter, the generated feature vector is mapped to a multidimensional space, and the mapped feature vector is displayed on the screen. It is possible to visualize what characteristics the data has. Thereby, the user can visually recognize what characteristics data is handled in the SNS and the communication service.

（２）また、本発明のプログラムにおいて、前記変数は、データがどのようなシステム上の機能で取り扱われたかを示し、前記システム上の機能を特徴としたシステム機能特徴ベクトルに基づいて、前記特徴ベクトルを生成することを特徴とする。 (2) In the program of the present invention, the variable indicates what function on the system the data is handled in, and the feature is based on a system function feature vector that characterizes the function on the system. A vector is generated.

このように、変数は、データがどのようなシステム上の機能で取り扱われたかを示し、システム上の機能を特徴としたシステム機能特徴ベクトルに基づいて、特徴ベクトルを生成するので、コミュニケーションで用いられたデータがどのようなシステム機能上で為されたかを表示することが可能となる。例えば、あるひとつのコミュケーションサービスを分析単位とした場合、ユーザが、チャットなどの“ＳＹＮＣＨＲＯＮＯＵＳ”なコミュニケーションツールに対して多くのコメントを投稿していた場合、該当のサービス上では、“ＳＹＮＣＨＲＯＮＯＵＳ”なサービスがよく利用されていると把握することが可能となる。このような情報に基づいて、各ユーザのシステム機能ベクトルを生成し、各システム機能特徴の変数の頻度によって視覚化することが可能となる。なお、このとき、分析データ数の違いを軽減するために、全ての分析単位数によって正規化しても良い。 In this way, the variable is used in communication because it indicates what function on the system the data was handled and generates a feature vector based on the system function feature vector that characterizes the function on the system. It is possible to display on what system function the data was created. For example, when a certain communication service is used as an analysis unit, if a user has posted many comments on a “SYNCHRONOUS” communication tool such as chat, “SYNCHRONUS” on the corresponding service. It is possible to grasp that the service is frequently used. Based on such information, a system function vector for each user can be generated and visualized according to the frequency of variables of each system function feature. At this time, in order to reduce the difference in the number of analysis data, normalization may be performed by the number of all analysis units.

（３）また、本発明のプログラムにおいて、前記変数は、データがどのようなシステム上の状況で取り扱われたかを示し、前記システム上の状況を特徴とした状況特徴ベクトルに基づいて、前記特徴ベクトルを生成することを特徴とする。 (3) In the program of the present invention, the variable indicates in what system situation the data is handled, and the feature vector is based on the situation feature vector characterized by the situation on the system. Is generated.

このように、変数は、データがどのようなシステム上の状況で取り扱われたかを示し、システム上の状況を特徴とした状況特徴ベクトルに基づいて、特徴ベクトルを生成するので、コミュニケーションで用いられたデータがどのような状況で為されたかを表示することが可能となる。例えば、分析単位を一つのコミュニケーションサービスとする。ユーザ群が、プライベートチャットなど、第三者に閲覧できないコミュニケーション機能に対して多くのコメントを投稿していた場合、コミュニケーションサービスを特徴づける機能として、プライベートチャットが重要であると把握することができる。このような情報に基づいて、各ユーザの状況特徴ベクトルを生成し、各状況特徴の変数のラベリング結果の頻度によって視覚化することが可能となる。なお、このとき、分析データ数の違いを軽減するために、全ての分析単位数によって正規化しても良い。 In this way, the variable is used in communication because it indicates what kind of system situation the data was handled and generates a feature vector based on the situation feature vector that is characterized by the situation on the system. It is possible to display in what situation the data was made. For example, the analysis unit is one communication service. When a group of users has posted many comments on a communication function that cannot be viewed by a third party, such as a private chat, it can be understood that private chat is important as a function that characterizes the communication service. Based on such information, a situation feature vector for each user can be generated and visualized according to the frequency of the labeling results of the variables of each situation feature. At this time, in order to reduce the difference in the number of analysis data, normalization may be performed by the number of all analysis units.

（４）また、本発明のプログラムにおいて、前記変数は、データがどのようなユーザの態度で取り扱われたかを示し、前記ユーザの態度を特徴としたユーザ態度ベクトルに基づいて、特徴ベクトルを生成することを特徴とする。 (4) In the program of the present invention, the variable indicates what kind of user's attitude the data is handled, and generates a feature vector based on a user attitude vector characterized by the user's attitude. It is characterized by that.

このように、変数は、データがどのようなユーザの態度で取り扱われたかを示し、ユーザの態度を特徴としたユーザ態度ベクトルに基づいて、特徴ベクトルを生成するので、ユーザが投稿したテキストがどのような態度を示唆しているのかを表示することが可能となる。例えば、ユーザが“ＴＨＡＮＫ”や“ＧＲＥＥＴ”に対して多くのコメントを投稿していた場合、該当のコミュニケーションサービス（分析単位）上で社交的な行動が多くなされていると把握することができる。このような情報に基づいて、ユーザ態度ベクトルを生成し、各ユーザ態度スキルベクトル変数の頻度によって視覚化することが可能となる。なお、このとき、分析データ数の違いを軽減するために、全ての分析単位数によって正規化しても良い。 In this way, the variable indicates what kind of user's attitude the data was handled, and the feature vector is generated based on the user attitude vector characterized by the user's attitude. It is possible to display whether such an attitude is suggested. For example, when a user has posted many comments on “THANK” or “GREET”, it can be understood that social actions are being performed on the corresponding communication service (analysis unit). Based on such information, a user attitude vector can be generated and visualized according to the frequency of each user attitude skill vector variable. At this time, in order to reduce the difference in the number of analysis data, normalization may be performed by the number of all analysis units.

（５）また、本発明のプログラムは、データに含まれる重要語を選定し、重要度の高い単語を要素とする重要語ベクトルに基づいて、前記特徴ベクトルを生成することを特徴とする。 (5) The program according to the present invention is characterized in that an important word included in data is selected and the feature vector is generated based on an important word vector having a word having a high importance as an element.

このように、データに含まれる重要語を選定し、重要度の高い単語を要素とする重要語ベクトルに基づいて、特徴ベクトルを生成するので、コミュニケーションで取り扱われたデータの特徴を強調して表示することが可能となる。 In this way, important words included in the data are selected, and feature vectors are generated based on important word vectors whose elements are words of high importance, so the features of data handled in communication are highlighted. It becomes possible to do.

（６）また、本発明のプログラムは、データを入力する処理と、前記入力したデータに対して、少なくとも一つの変数に基づいてラベリングを行なう処理と、を更に含むことを特徴とする。 (6) The program of the present invention further includes a process of inputting data and a process of labeling the input data based on at least one variable.

このように、データを入力する処理と、前記入力したデータに対して、少なくとも一つの変数に基づいてラベリングを行なう処理と、を更に含むので、入力したデータに対するラベリングを自動的に行なうことが可能となる。 As described above, since it further includes a process of inputting data and a process of labeling the input data based on at least one variable, it is possible to automatically label the input data. It becomes.

（７）また、本発明の端末装置は、データの特徴を可視化する端末装置であって、少なくとも一つの変数に基づいてラベリングされたデータを取得するデータ取得部と、前記取得したデータの変数をパラメータとして特徴ベクトルを生成する特徴ベクトル生成部と、前記生成した特徴ベクトルを多次元空間にマッピングするマッピング部と、前記マッピングされた特徴ベクトルを画面に表示する表示部と、を備えることを特徴とする。 (7) Moreover, the terminal device of the present invention is a terminal device that visualizes data characteristics, and includes a data acquisition unit that acquires data that is labeled based on at least one variable, and a variable of the acquired data. A feature vector generation unit that generates a feature vector as a parameter, a mapping unit that maps the generated feature vector to a multidimensional space, and a display unit that displays the mapped feature vector on a screen, To do.

（８）また、本発明の端末装置は、入力したデータに対して、少なくとも一つの変数に基づいてラベリングを行なう識別器を更に含むことを特徴とする。 (8) Moreover, the terminal device of this invention is characterized by further including the discriminator which labels the input data based on at least 1 variable.

このように、入力したデータに対して、少なくとも一つの変数に基づいてラベリングを行なう識別器を更に含むので、入力したデータに対するラベリングを自動的に行なうことが可能となる。 As described above, since the input data further includes a discriminator that performs labeling based on at least one variable, the input data can be automatically labeled.

（９）また、本発明のデータ処理方法は、データの特徴を可視化するデータ処理方法であって、少なくとも一つの変数に基づいてラベリングされたデータを取得するステップと、前記取得したデータの変数をパラメータとして特徴ベクトルを生成するステップと、前記生成した特徴ベクトルを多次元空間にマッピングするステップと、前記マッピングされた特徴ベクトルを画面に表示するステップと、を少なくとも含むことを特徴とする。 (9) Further, the data processing method of the present invention is a data processing method for visualizing data characteristics, the step of acquiring labeled data based on at least one variable, and the variable of the acquired data. The method includes at least a step of generating a feature vector as a parameter, a step of mapping the generated feature vector in a multidimensional space, and a step of displaying the mapped feature vector on a screen.

本発明によれば、データがどのような特徴を有しているのかを可視化することができる。これにより、ユーザは、ＳＮＳやコミュニケーションサービスでどのような特徴を有するデータが取り扱われているのかを視認することが可能となる。 According to the present invention, it is possible to visualize the characteristics of data. Thereby, the user can visually recognize what characteristics data is handled in the SNS and the communication service.

本実施形態に係るデータ処理システムの概略構成を示す図である。It is a figure which shows schematic structure of the data processing system which concerns on this embodiment. ＧＵＩイメージを示す図である。It is a figure which shows a GUI image. コーディング（ラベリング）の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of coding (labeling). 特徴ベクトル生成モジュールの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the feature vector generation module. 各システム機能特徴変数の頻度を表わした図である。It is a figure showing the frequency of each system function characteristic variable. システム機能ベクトル抽出処理を示すフローチャートである。It is a flowchart which shows a system function vector extraction process. 各状況特徴変数のラベリング結果の頻度を表わした図である。It is a figure showing the frequency of the labeling result of each situation characteristic variable. 状況特徴ベクトル抽出処理を示すフローチャートである。It is a flowchart which shows a situation feature vector extraction process. 各ユーザ態度スキルベクトル変数の頻度を表わした図である。It is a figure showing the frequency of each user attitude skill vector variable. ユーザ態度ベクトル抽出処理を示すフローチャートである。It is a flowchart which shows a user attitude | position vector extraction process. 特徴ベクトルをクラスタリングによってグルーピングする様子を示す図である。It is a figure which shows a mode that a feature vector is grouped by clustering. 特徴ベクトルを階層的に可視化した様子を示す図である。It is a figure which shows a mode that the feature vector was visualized hierarchically. サービス特徴抽出・提示モジュールの動作を示すフローチャートである。It is a flowchart which shows operation | movement of a service feature extraction and presentation module.

以下、本発明の実施形態について図面を参照して説明する。図１は、本実施形態に係るデータ処理システムの概略構成を示す図である。このデータ処理システムは、コミュニケーションデータ収集モジュール５、データベース７、ラベリングモジュール９、特徴ベクトル生成モジュール１１、サービス特徴抽出・提示モジュール２１、および表示モジュール２３から構成されている。コミュニケーションデータ収集モジュール５は、ＳＮＳ（Social Networking Service）１や、電子メール・通話データ３からデータを収集する。例えば、ＡＰＩ（Application Programming Interface）を利用したクローリングなどによりデータを収集することができる。この場合、インターネットラジオ局のＡＰＩや、Ｔｗｉｔｔｅｒ（登録商標）のＡＰＩを利用することができる。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing a schematic configuration of a data processing system according to the present embodiment. The data processing system includes a communication data collection module 5, a database 7, a labeling module 9, a feature vector generation module 11, a service feature extraction / presentation module 21, and a display module 23. The communication data collection module 5 collects data from SNS (Social Networking Service) 1 and e-mail / call data 3. For example, data can be collected by crawling using an API (Application Programming Interface). In this case, an API of an Internet radio station or an API of Twitter (registered trademark) can be used.

入力は、例えば、ＳＮＳのある期間の投稿テキスト情報、マルチメディアサービスのコメントデータ、電子メールのインタラクションデータ、チャットデータなどを利用することができる。また、同一のＳＮＳ内で、チャット機能やブログ機能などが存在していた場合には同一データとして扱うこともできるし、機能ごとに分割することもできる。このようにして収集されたデータは、データベース７に格納される。 For the input, for example, post text information for a certain period of SNS, comment data of multimedia service, electronic mail interaction data, chat data, and the like can be used. Further, when a chat function, a blog function, or the like exists in the same SNS, it can be handled as the same data, or can be divided for each function. The data collected in this way is stored in the database 7.

ラベリングモジュール９は、コミュニケーションデータ収集モジュール５が収集したデータに対して、コーディング（ラベリング）を実施する。このコーディングは、ＷＥＢ上でＧＵＩ（Graphical User Interface）を提供し、オペレータが手作業で行ない、ＤＢに格納することができる。例えば、コーディング規準は関連文献１に記載されているコミュニケーション分類スキームを利用することができる。 The labeling module 9 performs coding (labeling) on the data collected by the communication data collection module 5. This coding provides a GUI (Graphical User Interface) on the WEB and can be performed manually by an operator and stored in the DB. For example, the coding standard can use the communication classification scheme described in Related Document 1.

［関連文献１］
Susan C. Herring(2007), A Faceted Classification Scheme for Computer-Mediated Discourse. Language@Internet.http://www.languageatinternet.org/articles/2007/761
本発明では、以下のような変数に基づいてコーディングを実施する。 [Related Literature 1]
Susan C. Herring (2007), A Faceted Classification Scheme for Computer-Mediated Discourse.Language@Internet.http: //www.languageatinternet.org/articles/2007/761
In the present invention, coding is performed based on the following variables.

［システム特徴変数］
M1(Synchronicity)、M2(Message transmission)、M3(Persistence of transcript)、M4(Size of message buffer)、 M5(Channels of communication)、M6(Anonymous messaging)、M7(Private messaging)、M8(Filtering)、M9(Quoting)、M10(Message format)
これらのシステム特徴変数においては、各変数に対して値を設定することができる。例えば、Ｍ１であれば１の際にＳｙｎｃｈｒｏｎｕｓ、２の際にＡｓｙｎｃｈｒｏｎｏｕｓといったラベルを付与することができる。 [System feature variables]
M1 (Synchronicity), M2 (Message transmission), M3 (Persistence of transcript), M4 (Size of message buffer), M5 (Channels of communication), M6 (Anonymous messaging), M7 (Private messaging), M8 (Filtering), M9 (Quoting), M10 (Message format)
In these system characteristic variables, a value can be set for each variable. For example, in the case of M1, a label such as “Synchronous” at the time of 1 and “Asynchronous” at the time of 2 can be given.

［状況特徴変数]
S1(Participation Structure)、S2(Participant characteristics)、S3(Purpose)、S4 (Topic or Theme)、S5(Tone)、S6(Activity)、S7(Norms)、S8(Code)
これらの状況特徴変数については、自由記述とすることもできるし、あらかじめ選択肢を与えることもできる。 [Situation feature variable]
S1 (Participation Structure), S2 (Participant characteristics), S3 (Purpose), S4 (Topic or Theme), S5 (Tone), S6 (Activity), S7 (Norms), S8 (Code)
These situation feature variables can be free descriptions or can be given options in advance.

図２は、ＧＵＩイメージを示す図である。例えば、図２に示すように、画面の左側にはコミュニケーションデータが表示され、右側にてコーディング結果を入力することができる。入力方法は、チェックボックスで入力をしたり、タッチパネルにて選択したりすることも可能である。コーディング自体は複数人で実施することもできる。その際には、複数の結果を比較し、結果の一致率や、相違のある結果をコーディング者に再提示することもできる。 FIG. 2 is a diagram illustrating a GUI image. For example, as shown in FIG. 2, communication data is displayed on the left side of the screen, and coding results can be input on the right side. The input method can be input with a check box or selected with a touch panel. The coding itself can be performed by multiple people. In that case, it is possible to compare a plurality of results, and to re-present to the coder the result coincidence rate and the different results.

なお、変数は、上記の他にも情景変数、ユーザ態度変数などを自由に設定することも可能である。例えば、情景変数は、季節、時間帯、天候などを利用することができる。例えば、ユーザ態度変数としては話者の態度（提示・賛同、拒絶）などを利用することができ、例えば、以下の関連文献２の技術を利用して以下のように設定することができる。 In addition to the above variables, scene variables, user attitude variables, and the like can be freely set as variables. For example, a season variable, a season, a time zone, a weather, etc. can be utilized. For example, as the user attitude variable, a speaker's attitude (presentation / approval, rejection) or the like can be used. For example, it can be set as follows using the technique of Related Document 2 below.

［関連文献２］
Herring, S. C., Das, A., & Penumarthy, S. (2005). CMC act taxonomy. http://www.slis.indiana.edu/faculty/herring/cmc.acts.html
A1(Inquire)、A2(Request)、A3(Invite)、A4(Desire)、A5(React)、A6(Manage)、A7(Direct)、A8(Accept)、A9(Apologize)、A10(Repair)、A11(Reject)、A12(Elaborate)、A13(Thank)、A14(Inform)、A15(Claim)、A16(Greet)
なお、ラベリングモジュール９にＳＶＭ（Support vector machine）などの識別器としての機能を持たせて、コーディングを自動で行なうことも可能である。自動でコーディングを実施する場合には、あらかじめシステムに登録した事前情報を利用したり、事前に学習データを収集・コーディングを実施し、識別器により自動ラベリングを実施したりすることができる。例えば、ＳＮＳ（Ａ）によって得られた情報がＭ１―Ｍ１０まで固定であった場合、ＳＮＳ（Ａ）によって得られた他のデータについても同様の情報を自動で付与することができる。 [Related Literature 2]
Herring, SC, Das, A., & Penumarthy, S. (2005). CMC act taxonomy. Http://www.slis.indiana.edu/faculty/herring/cmc.acts.html
A1 (Inquire), A2 (Request), A3 (Invite), A4 (Desire), A5 (React), A6 (Manage), A7 (Direct), A8 (Accept), A9 (Apologize), A10 (Repair), A11 (Reject), A12 (Elaborate), A13 (Thank), A14 (Inform), A15 (Claim), A16 (Greet)
The labeling module 9 can be provided with a function as a discriminator such as SVM (Support vector machine) so that coding can be performed automatically. When coding is automatically performed, prior information registered in the system in advance can be used, or learning data can be collected and coded in advance, and automatic labeling can be performed by a discriminator. For example, when the information obtained by SNS (A) is fixed from M1 to M10, similar information can be automatically given to other data obtained by SNS (A).

また、Ｓ１−Ｓ８などについて、自動でラベリングを付与することもできる。例えば、学習データとして、大量のコミュニケーションデータに対するコーディング結果が蓄積できた場合、識別器により自動でラベルを付与することができる。例えば、コーディング結果Ｓ１が付与されたコミュニケーションデータをＴＦＩＤＦにより特徴ベクトル化し、ＳＶＭによりＳ１の正否を判断することよって、自動でラベルを付与することができる。 In addition, labeling can be automatically applied to S1-S8 and the like. For example, when a coding result for a large amount of communication data can be accumulated as learning data, a label can be automatically given by a discriminator. For example, the communication data to which the coding result S1 is given is converted into a feature vector by TFIDF, and a label can be automatically given by determining whether S1 is correct or not by SVM.

図３は、コーディング（ラベリング）の動作を示すフローチャートである。まず、コミュニケーションデータ収集モジュール５を介してデータを取得し（ステップＳ１）、識別器があるかどうかを判断する（ステップＳ２）。識別器がある場合は、ステップＳ７に遷移する。一方、ステップＳ２において、識別器が無い場合は、ラベルデータがあるかどうかを判断し（ステップＳ３）、ラベルデータがある場合は、ステップＳ６に遷移する。ラベルデータが無い場合は、ラベルデータを取得し（ステップＳ４）、ＧＵＩ表示され（ステップＳ５）、オペレータからラベリングされたデータに基づいて、識別器が生成される（ステップＳ６）。 FIG. 3 is a flowchart showing an operation of coding (labeling). First, data is acquired via the communication data collection module 5 (step S1), and it is determined whether there is a discriminator (step S2). If there is a discriminator, the process proceeds to step S7. On the other hand, if there is no discriminator in step S2, it is determined whether there is label data (step S3). If there is label data, the process proceeds to step S6. If there is no label data, the label data is acquired (step S4), displayed on the GUI (step S5), and a discriminator is generated based on the data labeled by the operator (step S6).

ここで、識別機は、例えば、“Support Vector Machine”を利用してラベルＡであるかどうかを識別する場合、ラベルＡが付与されたテキストデータ群から重要語を抽出し、重要語の頻度に基づいて特徴ベクトル化する（例えば、TF/IDFに基づくBag of Words）。ラベルＡが付与された学習データ群とその特徴ベクトル群を正データ、ラベルデータAが付与されていない学習データとその特徴ベクトル群を負データとして学習に利用することで、ラベルＡであるかどうかの識別器を生成することができる。そして、ラベルが付与されて（ステップＳ７）、終了する。 Here, for example, in the case of discriminating whether or not the label A is using the “Support Vector Machine”, the discriminator extracts an important word from the text data group to which the label A is given, and determines the frequency of the important word. Based on the feature vector (for example, Bag of Words based on TF / IDF). Whether or not it is label A by using the learning data group to which the label A is assigned and its feature vector group as positive data, the learning data to which the label data A is not assigned and its feature vector group as negative data for learning Discriminators can be generated. Then, a label is given (step S7), and the process ends.

図１において、特徴ベクトル生成モジュール１１は、状況特徴スキル抽出機能１３と、システム機能スキル抽出機能１５と、ユーザ態度スキル抽出機能１７と、重要語抽出機能１９とを備えており、コーディング結果とコミュニケーションデータに基づいて、コミュニケーションデータを特徴量化する。例えば、入力したコーディング結果は、各変数の入力値をパラメータとして、多次元ベクトルとして表現することができる。例えば、システム機能特徴、状況特徴、ユーザ態度特徴をベクトルとして抽出することができる。また、これらのベクトルを一つにまとめて特徴ベクトルとして抽出することもできる。さらに、蓄積されたコミュニケーションデータより、ＴＦＩＤＦ法に基づいて重要語を選定し、重要度の高い単語をベクトルの要素として、データ中の各要素となった単語の頻度を計算することによって、コミュニケーションデータをベクトル化することができる。 In FIG. 1, the feature vector generation module 11 includes a situation feature skill extraction function 13, a system function skill extraction function 15, a user attitude skill extraction function 17, and an important word extraction function 19. Based on the data, the communication data is featured. For example, the input coding result can be expressed as a multidimensional vector using the input value of each variable as a parameter. For example, system function features, situation features, and user attitude features can be extracted as vectors. Also, these vectors can be combined and extracted as a feature vector. Further, communication data is selected by selecting an important word from the accumulated communication data based on the TFIDF method, and calculating the frequency of the word as each element in the data with a word having high importance as a vector element. Can be vectorized.

図４は、特徴ベクトル生成モジュールの動作を示すフローチャートである。まず、コーディング結果とコミュニケーションデータを入力する（ステップＴ１）。次に、システム機能特徴ベクトルを抽出する（ステップＴ２）。次に、状況特徴ベクトルを抽出する（ステップＴ３）。次に、ユーザ態度ベクトルを抽出する（ステップＴ４）。次に、重要語ベクトルを抽出する（ステップＴ５）。そして、抽出したベクトルを統合して（ステップＴ６）、終了する。次に、上記のようにベクトルを抽出する処理について説明する。 FIG. 4 is a flowchart showing the operation of the feature vector generation module. First, a coding result and communication data are input (step T1). Next, a system function feature vector is extracted (step T2). Next, a situation feature vector is extracted (step T3). Next, a user attitude vector is extracted (step T4). Next, an important word vector is extracted (step T5). Then, the extracted vectors are integrated (step T6), and the process ends. Next, processing for extracting a vector as described above will be described.

［システム機能ベクトル抽出処理］
各カテゴリから得られたテキストに対して付与されたラベルデータに基づいて、コミュニケーションデータがどのようなシステム機能上で為されたかを特徴として抽出する。例えば、あるひとつのコミュケーションサービスを分析単位とした場合、ユーザが“ＳＹＮＣＨＲＯＮＯＵＳ”なコミュニケーションツール（チャットなど）に対して多くのコメントを投稿していた場合、該当のサービス上では、“ＳＹＮＣＨＲＯＮＯＵＳ”なサービスがよく利用されていると把握することができる。この情報に基づいて、ユーザのシステム機能ベクトルを生成する。 [System function vector extraction processing]
Based on the label data assigned to the text obtained from each category, what kind of system function the communication data is performed on is extracted as a feature. For example, when a certain communication service is used as an analysis unit, if a user has posted many comments on a “SYNCHRONOUS” communication tool (such as chat), “SYNCHRONUS” on the corresponding service. It can be understood that the service is often used. Based on this information, a user system function vector is generated.

図５は、各システム機能特徴変数の頻度を表わした図である。このように視覚化する場合、分析データ数の違いを軽減するために、全ての分析単位数で正規化することもできる。また、必ずしもサービス単位で処理を実施する必要はなく、あるサービス内で発生したマクロなコミュニケーションデータを一つの分析単位としても良い。 FIG. 5 is a diagram showing the frequency of each system function characteristic variable. When visualizing in this way, in order to reduce the difference in the number of analysis data, normalization can be performed for all the number of analysis units. Further, it is not always necessary to execute processing in service units, and macro communication data generated in a certain service may be used as one analysis unit.

図６は、システム機能ベクトル抽出処理を示すフローチャートである。まず、ラベルデータを取得し（ステップＰ１）、要素数をカウントする（ステップＰ２）。次に、すべてのシステム機能変数について、カウントしたかどうかを判断し（ステップＰ３）、すべてのシステム機能変数について、カウントしていない場合は、ステップＰ２に遷移する。すべてのシステム機能変数について、カウントした場合は、全ての分析単位数で正規化を行なって（ステップＰ４）、ベクトルを生成する（ステップＰ５）。 FIG. 6 is a flowchart showing system function vector extraction processing. First, label data is acquired (step P1), and the number of elements is counted (step P2). Next, it is determined whether all system function variables have been counted (step P3). If all system function variables have not been counted, the process proceeds to step P2. When all system function variables are counted, normalization is performed for all the number of analysis units (step P4), and a vector is generated (step P5).

［状況特徴ベクトル抽出処理］
各カテゴリから得られたテキストに対して付与されたラベルデータに基づいて、コミュニケーションがどのような状況で為されたかを特徴として抽出する。例えば、分析単位を一つのコミュニケーションサービスとした場合を考える。ユーザ群が、第三者に閲覧できないコミュニケーション機能（プライベートチャットなど）に対して多くのコメントを投稿していた場合、コミュニケーションサービスを特徴づける機能として、プライベートチャットが重要であると把握することができる。この情報に基づいて、ユーザの状況特徴ベクトルを生成する。 [Situation feature vector extraction processing]
Based on the label data assigned to the text obtained from each category, the situation in which the communication is performed is extracted as a feature. For example, consider the case where the analysis unit is one communication service. When a group of users has posted many comments on communication functions (private chat etc.) that cannot be viewed by third parties, it can be understood that private chat is important as a function that characterizes communication services. . Based on this information, a situation feature vector of the user is generated.

図７は、各状況特徴変数のラベリング結果の頻度を表わした図である。このように視覚化する場合、分析データ数の違いを軽減するために、全ての分析単位数で正規化することもできる。例えば、Ｓ２等、事前に要素が設定できない項目については、ユーザが利用しているＳＮＳ・コミュニティの年齢や、性別分布を事前に抽出し、パターン登録することもできる。例えば、ＳＮＳ１でのコミュニティが女性、３０代のみのコミュニティであった場合をパターンＰＡＴ１として登録し、ＳＮＳ２でのコミュニティが男女比率７：３、年齢分布として、２０代：３０代：４０代＝３：３：４をパターン２として登録することで頻度計算を実施することができる。 FIG. 7 is a diagram showing the frequency of the labeling result of each situation feature variable. When visualizing in this way, in order to reduce the difference in the number of analysis data, normalization can be performed for all the number of analysis units. For example, for items such as S2 for which elements cannot be set in advance, the age and gender distribution of the SNS / community used by the user can be extracted in advance and registered as a pattern. For example, a case where the community at SNS1 is a woman and a community only in their 30s is registered as a pattern PAT1. : 3: 4 can be registered as pattern 2 to calculate the frequency.

図８は、状況特徴ベクトル抽出処理を示すフローチャートである。まず、ラベルデータを取得し（ステップＱ１）、パターンの登録を行なう（ステップＱ２）。次に、要素数をカウントし（ステップＱ３）、すべての状況変数について、カウントしたかどうかを判断する（ステップＱ４）。すべての状況変数について、カウントしていない場合は、ステップＱ３に遷移する一方、すべてのシステム機能変数について、カウントした場合は、全ての分析単位数で正規化を行なって（ステップＱ５）、ベクトルを生成する（ステップＱ６）。 FIG. 8 is a flowchart showing the situation feature vector extraction process. First, label data is acquired (step Q1), and a pattern is registered (step Q2). Next, the number of elements is counted (step Q3), and it is determined whether all the status variables have been counted (step Q4). If all the status variables are not counted, the process proceeds to step Q3. On the other hand, if all the system function variables are counted, normalization is performed with all the number of analysis units (step Q5), and the vector is changed. Generate (step Q6).

［ユーザ態度ベクトル抽出処理］
各カテゴリから得られたテキストに対して付与されたラベルデータに基づいて、ユーザが投稿したテキストがどのような態度を示唆しているのかを特徴として抽出する。例えば、ユーザが“ＴＨＡＮＫ”や“ＧＲＥＥＴ”に対して多くのコメントを投稿していた場合、該当のコミュニケーションサービス（分析単位）上で社交的な行動が多くなされていると把握することができる。この情報に基づいて、ユーザの態度ベクトルを生成する。 [User attitude vector extraction processing]
Based on the label data given to the text obtained from each category, what kind of attitude the text posted by the user suggests is extracted as a feature. For example, when a user has posted many comments on “THANK” or “GREET”, it can be understood that social actions are being performed on the corresponding communication service (analysis unit). Based on this information, a user attitude vector is generated.

図９は、各ユーザ態度スキルベクトル変数の頻度を表わした図である。このように視覚化する場合、分析データ数の違いを軽減するために、全ての分析単位数で正規化することもできる。 FIG. 9 is a diagram showing the frequency of each user attitude skill vector variable. When visualizing in this way, in order to reduce the difference in the number of analysis data, normalization can be performed for all the number of analysis units.

図１０は、ユーザ態度ベクトル抽出処理を示すフローチャートである。まず、ラベルデータを取得し（ステップＲ１）、要素数をカウントする（ステップＲ２）。次に、すべてのユーザ態度変数について、カウントしたかどうかを判断し（ステップＲ３）、すべてのユーザ態度変数について、カウントしていない場合は、ステップＲ２に遷移する。すべてのユーザ態度変数について、カウントした場合は、全ての分析単位数で正規化を行なって（ステップＲ４）、ベクトルを生成する（ステップＲ５）。 FIG. 10 is a flowchart showing the user attitude vector extraction process. First, label data is acquired (step R1), and the number of elements is counted (step R2). Next, it is determined whether all user attitude variables have been counted (step R3). If all user attitude variables have not been counted, the process proceeds to step R2. When all the user attitude variables are counted, normalization is performed with all the analysis unit numbers (step R4), and a vector is generated (step R5).

図１において、サービス特徴抽出・提示モジュール２１は、特徴ベクトル生成モジュール１１によって作成された特徴ベクトルに基づいて、多次元空間上へマッピングする。例えば、主成分分析を適用し、第１主成分・第２主成分を表現軸として利用することで２次元平面上のプロットとして表現することができる。また、第３主成分を加えることで三次元表示が可能となる。また、そのほかにも、ＳＯＭ（Self-Organizing Map）などの可視化技術を利用することが可能である。 In FIG. 1, the service feature extraction / presentation module 21 performs mapping on a multidimensional space based on the feature vector created by the feature vector generation module 11. For example, it can be expressed as a plot on a two-dimensional plane by applying principal component analysis and using the first principal component and the second principal component as expression axes. In addition, three-dimensional display is possible by adding the third main component. In addition, visualization techniques such as SOM (Self-Organizing Map) can be used.

図１１は、特徴ベクトルをクラスタリングによってグルーピングする様子を示す図である。グルーピングの最大枠１００の範囲内で、複数のグループ１０１が形成されている。各グループ１０１には、特徴ベクトル１０２が少なくとも１つ含まれている。また、特徴ベクトル１０２のように、分類の基準によって、異なる特徴を有する特徴ベクトルは、異なるグループに属することとなる。図１１に示すように、特徴ベクトルによって表現されたコミュニケーションデータは、さらにクラスタリング手法を適用することによって、グルーピングして表示することが可能である。クラスタリング方式は、例えば、Ｋ−ｍｅａｎｓ法などにより分類することができる。画面上でプロットをクリックすることによって、各ＳＮＳ上のコミュニケーションデータの分類結果を閲覧することができる。なお、この処理は、主成分分析を適用する前にも実施することができる。 FIG. 11 is a diagram illustrating a state in which feature vectors are grouped by clustering. A plurality of groups 101 are formed within the range of the maximum grouping frame 100. Each group 101 includes at least one feature vector 102. Further, like the feature vector 102, feature vectors having different features belong to different groups depending on the classification criteria. As shown in FIG. 11, the communication data expressed by the feature vector can be grouped and displayed by further applying a clustering method. The clustering method can be classified by, for example, the K-means method. By clicking the plot on the screen, the classification result of the communication data on each SNS can be viewed. Note that this processing can also be performed before the principal component analysis is applied.

図１２は、特徴ベクトルを階層的に可視化した様子を示す図である。図１２では、グルーピングの最大枠２００の範囲内で、最上位の階層２０１で複数のグループ２０６、２０７、２２０が形成されている。最上位の各グループ２０６、２０７、２２０は、それぞれ第２階層のグループ２０２、２０３、２０４を有している。第２階層のグループ２０２には、特徴ベクトル２０８、２０９が含まれており、さらに特徴ベクトル２０８、２０９は、第３階層のグループ２１０、２１１を有している。第２階層のグループ２０３、２０４も同様である。すなわち、第２階層のグループ２１４、２１５は、それぞれ、第３階層のグループ２１２、２１３を有している。また、第２階層のグループ２１８、２１９は、それぞれ、第３階層のグループ２１６、２１７を有している。 FIG. 12 is a diagram illustrating a state in which feature vectors are visualized hierarchically. In FIG. 12, a plurality of groups 206, 207, and 220 are formed in the highest hierarchy 201 within the range of the maximum grouping frame 200. The uppermost groups 206, 207, and 220 have second-level groups 202, 203, and 204, respectively. The second layer group 202 includes feature vectors 208 and 209, and the feature vectors 208 and 209 have third layer groups 210 and 211. The same applies to the groups 203 and 204 in the second hierarchy. That is, the second layer groups 214 and 215 have third layer groups 212 and 213, respectively. Further, the second layer groups 218 and 219 have third layer groups 216 and 217, respectively.

図１２に示すように、サービス特徴抽出・提示モジュール２１は、システム特徴変数による特徴ベクトル、状況特徴変数による特徴ベクトル、コミュニケーションデータの重要語に基づく特徴ベクトルについて、階層的に可視化することも可能である。図１２では、システム特徴変数による空間を第１の空間としているが、順番は可変である。さらに、コミュニケーションデータの重要語に基づく特徴ベクトルについては、クラスタリングの結果に基づいて、代表的な重要語を空間上に表示することもできる。 As shown in FIG. 12, the service feature extraction / presentation module 21 can also hierarchically visualize feature vectors based on system feature variables, feature vectors based on situation feature variables, and feature vectors based on key words of communication data. is there. In FIG. 12, the space based on the system feature variables is the first space, but the order is variable. Further, for feature vectors based on important words of communication data, representative important words can be displayed on the space based on the result of clustering.

図１３は、サービス特徴抽出・提示モジュールの動作を示すフローチャートである。まず、特徴ベクトルを抽出し（ステップＶ１）、Ｋ−ｍｅａｎｓ法における次元圧縮を行なうかどうかを判断する（ステップＶ２）。次元圧縮を行なわない場合は、ステップＶ５に遷移する一方、次元圧縮を行なう場合は、主成分分析を行ない（ステップＶ３）、第Ｎ主成分を抽出する（ステップＶ４）。次に、クラスタリングを行ない（ステップＶ５）、クラスタリング結果を可視化するために表示処理を行なう（ステップＶ６）。次に、すべての特徴ベクトルについて処理したかどうかを判断し（ステップＶ７）、すべての特徴ベクトルについて処理していない場合は、ステップＶ１に遷移する。一方、すべての特徴ベクトルについて処理した場合は、代表的な重要語を抽出し（ステップＶ８）、抽出した重要語に基づく特徴ベクトルを空間上に表示する処理を行なって（ステップＶ９）、終了する。 FIG. 13 is a flowchart showing the operation of the service feature extraction / presentation module. First, a feature vector is extracted (step V1), and it is determined whether or not to perform dimension compression in the K-means method (step V2). If dimensional compression is not performed, the process proceeds to step V5. On the other hand, if dimensional compression is performed, principal component analysis is performed (step V3), and the Nth principal component is extracted (step V4). Next, clustering is performed (step V5), and display processing is performed to visualize the clustering result (step V6). Next, it is determined whether or not all feature vectors have been processed (step V7). If all feature vectors have not been processed, the process proceeds to step V1. On the other hand, when all the feature vectors have been processed, representative important words are extracted (step V8), and a feature vector based on the extracted important words is displayed on the space (step V9), and the process ends. .

以上説明したように、本実施形態によれば、実際にデータを取得することが困難なＳＮＳに対してもコミュニケーション状況を可視化することができため、ＳＮＳ上でどのようなコミュニケーションが行なわれているのかを示す情報をユーザに提供することが可能となる。その結果、ユーザは、自身に適したＳＮＳを容易に選択することが可能となる。また、本実施形態によれば、従来技術では実現できなかった、複数の観点での特徴量（システム変数・状況変数・コミュニケーション変数など）を統一的に表示することができる。 As described above, according to the present embodiment, the communication status can be visualized even for an SNS in which it is difficult to actually acquire data, so what kind of communication is performed on the SNS. It is possible to provide the user with information indicating whether or not. As a result, the user can easily select an SNS suitable for the user. Further, according to the present embodiment, feature quantities (system variables, situation variables, communication variables, etc.) from a plurality of viewpoints that could not be realized by the conventional technology can be displayed in a unified manner.

５コミュニケーションデータ収集モジュール
７データベース
９ラベリングモジュール
１１特徴ベクトル生成モジュール
１３状況特徴スキル抽出機能
１５システム機能スキル抽出機能
１７ユーザ態度スキル抽出機能
１９重要語抽出機能
２１サービス特徴抽出・提示モジュール
２３表示モジュール

5 Communication Data Collection Module 7 Database 9 Labeling Module 11 Feature Vector Generation Module 13 Situation Feature Skill Extraction Function 15 System Function Skill Extraction Function 17 User Attitude Skill Extraction Function 19 Key Word Extraction Function 21 Service Feature Extraction / Presentation Module 23 Display Module

Claims

A program for visualizing the characteristics of data,
Processing to obtain labeled data based on at least one variable;
Processing to generate a feature vector using the acquired data variable as a parameter;
Mapping the generated feature vector into a multidimensional space;
A program for causing a computer to execute a series of processes of displaying the mapped feature vector on a screen.

The variable represents the function on which the data is handled by the system, and the feature vector is generated based on a system function feature vector characterized by the function on the system. 1. The program according to 1.

The variable represents the situation on the system in which data is handled, and the feature vector is generated based on a situation feature vector characterized by the situation on the system. The listed program.

2. The program according to claim 1, wherein the variable indicates a user's attitude in which data is handled, and a feature vector is generated based on a user attitude vector characterized by the user's attitude. .

5. The feature vector according to claim 2, wherein an important word included in the data is selected, and the feature vector is generated based on an important word vector including a word having high importance as an element. program.

The process of entering data,
The program according to claim 1, further comprising: a process of labeling the input data based on at least one variable.

A terminal device for visualizing data characteristics,
A data acquisition unit for acquiring labeled data based on at least one variable;
A feature vector generation unit that generates a feature vector using the acquired data variable as a parameter;
A mapping unit for mapping the generated feature vector to a multidimensional space;
And a display unit that displays the mapped feature vector on a screen.

8. The terminal device according to claim 7, further comprising a discriminator that performs labeling on the input data based on at least one variable.

A data processing method for visualizing data characteristics,
Obtaining labeled data based on at least one variable;
Generating a feature vector using the acquired data variable as a parameter;
Mapping the generated feature vector to a multidimensional space;
Displaying at least one of the mapped feature vectors on a screen.