WO2016072769A2

WO2016072769A2 - Method and system for visualizing data using comment data of object

Info

Publication number: WO2016072769A2
Application number: PCT/KR2015/011861
Authority: WO
Inventors: 이경원; 김기남; 하효지
Original assignee: 아주대학교산학협력단
Priority date: 2014-11-07
Filing date: 2015-11-05
Publication date: 2016-05-12
Also published as: KR101602898B1; WO2016072769A3

Abstract

The present invention is a technique relating to a method and a system for visualization using expression components which are collected from comment data of an object, and relates to, for example, if the object is a content item, a technique for visualizing expression components which appear in comments, by which consumers consuming the content item express emotion or opinion about the object. The present invention visualizes expression components expressing emotions or opinions of users which appear in user comments on an object, and analyzes not only object information, such as production companies and prices, provided by existing object information, but also expression components which express emotions or opinions which a user has when using the object, whereby the present invention can provide information which can serve as selection criteria to a user who intends to newly use the object.

Description

Data visualization method and system using comment data of object

The present invention relates to a data visualization method and system using comment data of an object, and more particularly, to a technology for visualizing elements expressing emotions or opinions of a user or a consumer.

The present invention is derived from a study conducted as part of the Humanities and Social Foundations research project of the Ministry of Education and the Korea Research Foundation. [Task Management Number: S-2013-A0403-00010, Title: Movie Recommendation Using Situational Vocabulary Distribution Map Visualization of the system].

In general, users who consume content such as movies, music, literary works, etc., or use the goods or services, comment on reviews or emotions that use the content, goods, or services (hereinafter referred to as "objects"). Users who have not used the object or want to get information about the object get information by referring to comments left by users who have used the object in advance.

The user wants to obtain information about the object. Since the comment data about the object is configured based on text, there is a problem that it takes a long time for the user to obtain information about the object by referring to the comment data. In particular, when the amount of comment data for an object is enormous, a large number of users leave a comment, or when the comment data accumulates for a long time, a considerable effort is required only by the user reading the contents of the comment data.

Therefore, in order to cope with such a problem, a research has been conducted on a technology capable of searching for a comment or an object based on the vocabulary in the comment data, and allowing a user to shorten a search time for the comment and the object.

An example of a method of searching for content using comment information on such content is described in Korean Patent Registration No. 10-0917784, "Method and System for Retrieving Group Emotion Information Based on Comment on Content".

The prior art is a search method that collects comments on various contents on the Internet to create a search database (hereinafter referred to as a DB) and shows an objective and reliable ranking results for emotional queries using the search DB. And to provide a system. In particular, it is a technique to adjust the recommendation priority of an object by reflecting the frequency of the emotional words appearing in the comments for the query containing the emotional words.

However, while the above prior art refers to a technique for retrieving emotional words from comments on an object, the overall emotion or opinion expected by the user about the object is not effectively illustrated, and an object having a plurality of comments has priority. It is recommended as a limit.

This is a limitation caused by the prior art adopting text-based emotional word retrieval, and it is required to develop a technology that can effectively show the overall emotion or opinion expected for one object (content, product or service).

The present invention is derived to solve the problems of the prior art as described above, the existing object information by visualizing elements expressing the user's feelings or opinions appearing in the existing user comments on the object (content, goods or services) It aims to provide information that can be used as a basis for object selection to users who want to newly use the object by analyzing the emotions or opinions that the user uses and expresses the object, as well as objective information such as the production company and price provided by the company. .

The present invention provides a method and system for intuitively visualizing the overall distribution of emotions or opinions expressed on an object by visualizing a plurality of elements expressing emotions or opinions on an object based on a semantic distance. The purpose is to provide.

The present invention may visualize the emotions or opinions that are representatively expressed for one object, but by visualizing the plurality of expression elements based on relative semantic distance, the relative between the plurality of expression elements expressed for the object It is an object to provide a means for intuitively recognizing distances and distributions.

In addition, the present invention is not limited to the text, it is an object of the present invention to provide a means for visualizing the distribution based on the meaning distance by reflecting all the various non-verbal elements that can express emotions or opinions, such as emoticons and icons. In addition, since it can be free from the constraint of text, it can also provide a means to visualize in one frame covering opinions or feelings expressed in various foreign languages.

In addition, the present invention calculates the frequency of the expression elements expressing each emotion or opinion in the comment data that can be obtained through various paths, for example, collected on the website, so that the results can be easily understood. It is intended to provide a graph.

In order to achieve the above object, a method of visualizing an expression element according to an embodiment of the present invention comprises the steps of extracting a plurality of expression elements from the collected comment data for the object selected by the user and the extraction And visualizing the extracted plurality of presentation elements based on a distribution based on a semantic distance between the plurality of presentation elements.

In this case, the method may further include measuring a frequency extracted in the comment data of the extracted expression elements, and visualizing the expression elements may visualize the extracted expression elements according to the frequency of the measured expression elements. It is characterized by.

Further, after the extracting of the expression elements, comparing the extracted expression elements with previously extracted expression elements and confirming whether a new expression element among the extracted expression elements is added. It may further include.

In the visualizing of the expression elements, when a new expression element of the extracted expression elements is added, one or more adjacent expressions having a meaning distance from the new expression element among the previously extracted expression elements within a predetermined criterion. Determining an element and determining a semantically position of the new representation element based on the determined semantic distance from the one or more adjacent representation elements.

In addition, after extracting the expression elements, determining the validity of the extracted expression elements, and if the extracted expression elements are invalid for the object selected by the user, the invalid expression elements are removed. The method may further include, after the extracting the expression elements, measuring the extracted frequency in the comment data of the extracted expression elements, and determining the validity of the extracted expression elements. The step of determining may reflect the measured frequency of the extracted expression element to determine the validity of the extracted expression element.

After extracting the expression elements, identifying the frequency with which the extracted expression elements are extracted from an object other than the object selected by the user, wherein the extracted expression elements are among objects other than the object selected by the user The method may further include determining whether the object is extracted at a predetermined frequency or more from a predetermined number or more. After extracting the expression elements, the method may further include measuring a frequency at which the expression elements are extracted and measuring the extracted frequency. And adjusting the measured frequency by weighting the frequency of the measured expression element according to the method, and visualizing the expression elements may include visualizing the expression elements by reflecting the adjusted frequency. It can be characterized.

Also, after extracting the expression elements, measuring the frequency at which the expression element is extracted from the comment data, comparing the frequency of occurrence of the expression element in the object selected by the user with the measured frequency And adjusting the measured frequency by weighting a frequency from which the expression element is extracted according to a result of comparing the frequency of appearance of the expression element in the object selected by the user with the measured frequency. have.

The extracting of the expression elements may include searching for whether the extracted expression element is stored in a database in which a standardized expression element is stored in advance, and if the extracted expression element is not stored in the database, Identifying a standardized representation element in the database that is closest to the representation element as a representative representation element of the extracted representation element, wherein measuring the frequency includes extracting the representation element from the comment data. Summing the frequency extracted from the sum to the frequency extracted by the identified representative expression element in the comment data, and visualizing the expression elements may visualize the representative expression element by reflecting the summed frequency. Can be.

The visualizing of the presentation elements may include visualizing the presentation elements against a background of a multi-dimensional scaling map (MDS map) including the presentation elements.

According to an embodiment of the present invention, a system for visualizing an expression element may include a storage device configured to store comment data on an object selected by a user, and an expression element extractor configured to extract a plurality of expression elements from the stored comment data. And a visualization unit for visualizing the extracted plurality of expression elements based on a distribution based on the semantic distance between the expression elements.

According to the present invention, it is possible to intuitively analyze the expression elements felt by users before using the object by confirming the expression elements felt by those who have already used the object using the object through a visualization graph. From the point of view of the user, what kind of feelings people have about the object or the user who selects the object can easily select an object.

In addition, by providing a visualization graph generated through the present invention on a web site as a script program can be provided simultaneously to a large number of users.

In addition, the present invention can be provided through a web page on the browser without installing a separate program, so that whenever the comment data is updated, the user can be provided with the analysis result in real time without a new data management or distribution procedure.

In addition, the present invention may intuitively confirm the response of the public opinion to the policy when a government or a public agency announces a policy or a plan and people express their intention through the Internet.

In addition, it is possible to collect public opinion on various accidents occurring in the Internet or companies occurring outside the Internet and to analyze the changes in public opinion responses analyzed by the companies in real time. It may correspond.

In addition, by visualizing a plurality of elements on which an emotion or opinion is expressed with respect to an object based on a semantic distance, an overall distribution of the emotion or opinion expressed with respect to the object may be intuitively provided to the user.

In addition, although the emotions or opinions that are representatively expressed with respect to one object may be visualized, the plurality of expression elements may be visualized based on relative semantic distance, thereby intuitively indicating the relative distance and distribution between the plurality of expression elements expressed with respect to the object. Can be recognized.

In addition, it is possible to visualize the distribution based on the semantic distance by reflecting not only text but also various non-verbal elements that can express emotions or opinions such as emoticons and icons, and can be free from the limitations of text. Even expressed opinions or emotions can be visualized within a frame.

1 is a diagram showing an emotional vocabulary selected for producing an emotional vocabulary distribution map according to an embodiment of the present invention.

FIG. 2 is a diagram showing the maximum value of the TF-IDF score of each emotional vocabulary shown in FIG. 1.

FIG. 3 is a diagram illustrating 36 emotion words that are finally selected from each of the emotion words shown in FIG. 1.

4 is a diagram illustrating an emotional vocabulary distribution map according to an embodiment of the present invention.

5 to 8 are diagrams illustrating a representation element extracted from comment data of an object in a heat map form according to an embodiment of the present invention.

9 is a flowchart illustrating a method of visualizing an expression element according to an embodiment of the present invention.

FIG. 10 is a flowchart illustrating a method of visualizing a measured element measured according to the frequency of the expressed elements according to an embodiment of the present invention.

11 is a flowchart illustrating whether new vocabulary has been added according to an embodiment of the present invention.

12 is a flowchart illustrating a case where a new expression element is added according to an embodiment of the present invention.

13 is a view showing a process of determining the validity of the expression element according to an embodiment of the present invention.

14 is a view showing a process of determining the criterion of the frequency of the expression element of the validity of the expression element according to an embodiment of the present invention.

15 is a flowchart illustrating a process of determining the validity of expression elements according to an embodiment of the present invention.

16 is a flowchart illustrating a method of controlling the influence of an expression element when a specific expression element is concentrated according to an embodiment of the present invention.

FIG. 17 is a flowchart illustrating a method for assigning a weight when a frequency of a specific expression element actually appears in a specific object according to an embodiment of the present invention.

FIG. 18 is a flowchart illustrating a method of mapping an expression element to a representation element of a pre-stored standard type and measuring a frequency according to an embodiment of the present invention.

19 is a diagram illustrating a system for visualizing an expression element according to an embodiment of the present invention.

20 is a diagram illustrating a system for visualizing a presentation element by identifying a new presentation element according to an embodiment of the present invention.

21 is a diagram illustrating a system for visualizing an expression element by measuring and adjusting a frequency of the expression element according to an embodiment of the present invention.

22 is a view showing in detail the expression element extraction unit according to an embodiment of the present invention.

23 to 27 are views showing different visualization methods according to an embodiment of the present invention.

FIG. 28 is a diagram illustrating a three-dimensional application of a heat map visualization method according to an embodiment of the present invention.

29 is a diagram illustrating a representation element visualization method according to an embodiment of the present invention with a contour line.

FIG. 30 is a view showing the contour map shown in FIG. 29 in three dimensions.

31 to 33 are diagrams illustrating a utilization method based on a semantic map according to an embodiment of the present invention.

Other objects and features of the present invention in addition to the above object will be apparent from the description of the embodiments with reference to the accompanying drawings.

Preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing the present invention, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

However, the present invention is not limited or limited by the embodiments. Like reference numerals in the drawings denote like elements.

The present invention relates to a visualization method and system using an expression element collected from comment data of an object, and the object refers to an object including emotions of a person, such as a movie, a product, a novel, a game, or a travel, selected by a user. The user can visualize the emotions that appear from comments or reviews of objects that contain human emotions.

As an embodiment of the present invention, the object may be described as a visualization method and system using comment data about the movie, limited to the movie.

The comment data of the movie may use data collected through a web service constructed according to a user, or the comment data accumulated in a large portal and a fellowship bulletin board may be individually collected using a program.

As an embodiment of the present invention, a web crawler that can collect data to automate the collection of emotion vocabulary containing user's emotions from comment data about a movie may be used, and the crawler may be a large portal (Naver, Daum, etc.). ) You can collect comments and comments of specific movies in the form of unrefined data on the movie homepage, process the collected data into data that can be used for research, and extract the emotional vocabulary by analyzing the purified data. Accordingly, the emotional vocabulary collected through the crawler may be connected to the situation in which the movie is viewed to recommend a movie that meets the user's motivation for use.

In order to visualize the frequency of emotional vocabulary shown in the movie, the location of each emotional word must be specified on a two-dimensional plane. To this end, position coordinates on a two-dimensional surface may be derived using correlations between the emotional words. In order to produce a distribution map of emotional vocabulary, only the emotional vocabulary that can be felt when watching a movie was classified among the 834 emotional terms by referring to the study on the adequacy and the frequency of experience of Korean emotional terms by Han Duk-woong and Kang Hye-ja (2000). At this time, one expert in the Korean language and literature department of Ajou University and two inventors of the present invention selected only the emotional vocabulary capable of gathering opinions together, and finally selected the final 100 emotional vocabulary.

In addition to selecting the emotional vocabulary through expert analysis, a survey was conducted to select the final emotional vocabulary based on 100 emotional vocabularies selected to select the emotional vocabulary that users felt most when watching a movie. The survey was conducted with 30 students from the Department of Media at Ajou University. The survey gave a brief conceptual description of the emotions that can be felt when watching a movie. I investigated how much I can feel. The actual questionnaire began with 'Please think about the different kinds of movie stories you have seen so far, and then, how do you feel about the feelings presented next time you watch the movie?' The answer was asked on a Likert-type scale, with one point meaning 'nothing at all' and seven points meaning 'very relevant'.

In this study, we conducted expert analysis and user survey to collect the emotional vocabulary that can be felt best when watching a movie in order to meet the recommendation of a movie using the user's motive for using the user. To select highly relevant emotional vocabulary based on the information, 68 emotional vocabularies suitable for movie recommendation were removed by additionally removing 32 additional emotional vocabularies (less than 4.00 'normal') from the average analysis. Were screened.

1 is a view showing 68 emotional vocabulary suitable for the movie recommendation thus selected.

FIG. 2 derives the TF-IDF score of each emotional vocabulary appearing in a comment or review of the movie, in order to further remove the emotional vocabulary by comparing the actual movie data with the 68 emotional vocabulary described in FIG. The maximum value of the TF-IDF score which can appear in each emotional vocabulary is shown.

In this case, the term frequency (TF) refers to a value indicating how often a specific word appears in a document, and the document frequency (DF) refers to the number of documents in which a specific word appears. inverse document frequency).

The figure shown in FIG. 2 shows that the ratio of the TF-IDF score is 0.8% or less in all movies in the case of 'Amazement' among the emotional words from which the TF-IDF score is derived. In this case, it means that the ratio of TF-IDF score reaches 42% in at least one film.

In this case, Figure 3 is a diagram showing the 36 selected emotional vocabulary finally removed the emotional vocabulary of the TF-IDF score is less than 10%.

In order to derive the semantic distance between each emotional vocabulary in the final clustered 36 emotional vocabularies shown in FIG. 3, the correlation was analyzed by measuring the distance between similar or different emotional vocabularies based on the 36 emotional vocabularies. Multi-Dimensional Scaling (MDS) can be used.

In this case, multidimensional scale analysis is a technique associated with statistics representing relative distances between objects by calculating relative distances between objects, and is a background technique for measuring similarity and dissimilarity in data visualization.

The advantage of multidimensional scaling is that it is possible to construct a semantic map of entities that only know relative distances, and to build a semantic map based on psychological distance as well as physical distance.

In order to analyze the multi-dimensional scale according to an embodiment of the present invention, a survey of 36 emotional words was conducted semantically, with a total of 20 subjects including 11 male students and 9 female students in 20 universities in Gyeonggi-do and Seoul. The survey creates a questionnaire with 36 emotional words on the horizontal axis and vertical axis (68x68), and checks using a Likert scale that gives 3 points if the distance between the emotional words is the closest and -3 points if the distance between them is the longest. It consisted of. Based on the data recorded by 20 people, we used UCINET program that can utilize various network analysis methods, and accordingly, Metric MDS based on 68 emotional vocabularies selected in the semantic distance between 36 emotional vocabulary words is shown in FIG.

As a result, the emotional vocabulary related to the representative words “Happy” and “Surprise” were distributed in the positive direction of the X axis, and the representative words “Anger” and “Disgust” in the negative direction of the X axis. Related emotional words were distributed. Emotional vocabulary related to the representative words “Fear” and “Surprise” were distributed in the positive direction of the Y axis, and emotions related to the keywords “Sad” and “Boring” in the negative direction of the Y axis. Vocabulary is distributed.

Accordingly, it can be seen that positive emotional vocabularies are distributed in the positive (+) direction on the X axis and negative emotional vocabularies are distributed in the negative (-) direction on the X axis in the nature of the emotional vocabulary.

Also, the positive (+) direction of the Y-axis is dynamic (which can take a relatively large gesture when feeling), and the negative (-) direction of the Y-axis is positive (positive when feeling). It can be seen that emotional vocabularies (which can take small gestures) are distributed.

And the words related to 'Happy', 'Sad', 'Anger', 'Fear', 'Disgust' and 'Boring' can be seen that each word is clearly clustered. For the representative word 'Surprise' It can be seen that it is divided into 'Happy' representative cluster and 'Fear' representative cluster. This can be interpreted as dominant when users watch a movie, 'when the emotions are overwhelmed by overwhelming joy' and 'when the emotions are overwhelmed by sudden fears'.

In order to visualize the emotional vocabulary extracted from the comment data for the movie described with reference to FIGS. 1 to 4, the frequency of the emotional vocabulary constituting the MDS Map is required. The emotional vocabulary frequency in each movie is measured by comparing the comment data and the emotional vocabulary which are selected through the advanced process.

In addition, the TF-IDF score is calculated and adjusted to lower the weight of certain vocabulary words that appear frequently regardless of the nature of the film. Finally, the TF-IDF score of each selected emotional vocabulary can be visualized.

The final visualization graph is based on the MDS Map of the emotional vocabulary and can be represented as a heat-map composed of small square cells. At this time, all cells are initialized to a value of 0, and the number of cells increases according to the TF-IDF score of the emotional vocabulary located in the corresponding cell. As the value of the cell increases, the color changes to a different color, so that the high and low scores of the corresponding emotional vocabulary TF-IDF score can be confirmed. In addition, the cell with the higher value affects the value of surrounding cells, so that the graph becomes a topographical map.

5 is a graph visualizing the distribution of the emotional vocabulary appearing in the viewers' comment data for the movie 'Sulguk Train'. As shown in FIG. 5, the audience in the movie 'Sulguk Train' shows a fun and great response, and the feeling of being sad and boring also shows a high frequency. In fact, if you look at the comments on the movie about 'Seolguk Train', you can see many reviews from audiences who are disappointed with the movie.

FIG. 6 is a diagram visualizing the movie 'Planetary Murder Case' in the form of a heat map, and the highest emotional vocabulary among the emotions of the viewers shown in the horror film 'Planetary Murder Case' is 'surprise'. It can be seen that the frequency of emotional vocabulary related to fear is high.

FIG. 7 is a diagram visualizing the movie 'Don Krai Mami' in the form of a heat map, and the distribution of the spectators' emotional vocabulary for 'Don Krai Mami' produced with the actual crime incident is' anger '. It can be seen that the frequency is shown in many ways.

FIG. 8 is a view visualizing the movie 'WangNang Sori' in the form of a heat map, and in the case of 'WangNang Sori', the emotions of the audiences are shown to be high in 'Sad' and 'Emotional'.

Through the above example, it can be seen that the comment data collected from the comments created after watching a movie corresponds to the genre and story characteristics of the movie, resulting in an emotional vocabulary pattern.

As an embodiment of the present invention, a method of extracting and visualizing an emotional vocabulary using comment data on a movie has been described, but thinking, intention, evaluation, Cognitive activities, such as opinions, arguments, and rebuttals, and emotional responses such as emotions, emotions, desires, and attitudes may be targeted.

In addition, the object to which the present invention may be applied includes human emotions, emotions, desires, and attitudes in the emotional part, and includes thoughts, intentions, evaluations, opinions, arguments, and rebuttals in the cognitive part. In addition, relations include cultural contents, human relations (communication, conflict), social relations (multiculturalism, etc.), and relations with technology (cultural delays, etc.).

The method of visualizing an expression element may include extracting a plurality of expression elements from the comment data collected for the object selected by the user (S910), and extracting the plurality of expressions based on a distribution based on the semantic distance between the extracted plurality of expression elements. Visualize the elements (S920).

In this case, the comment data on the object refers to all comment data including emotions of people, such as reviews on movies, product reviews, novel reviews, game reviews, travel reviews, and services.

In addition, the expression elements include words, paragraphs, emoticons, and the like that represent the emotions of people extracted from the comment data.

In addition, a method of visualizing based on a distribution based on a semantic distance between a plurality of presentation elements may be based on a heat map based on a multi-dimensional scaling map (MDS map) described with reference to FIGS. 1 to 4. map) or contour.

The method of visualizing an expression element may include extracting a plurality of expression elements from the comment data collected for the object selected by the user (S910), and measuring the extracted frequency in the comment data of the extracted expression elements (S930). In operation S920, the plurality of extracted expression elements may be visualized based on a distribution based on a semantic distance between the plurality of expression elements. At this time, the shape of a heat map, contours, etc. is extracted based on a multi-dimensional scaling map (MDS map) including the expression elements according to the frequency of the measured expression elements. Can be visualized.

Also, when the expression elements are extracted and the frequency of the expression elements is measured, if the standardized expression element is not a standard type, the expression elements are mapped to the expression elements of the standard type stored in the dictionary, and each of the expression elements is expressed based on the expression elements of the standard type on the mapped dictionary. You can measure the frequency in the comment data of an object.

In the method for visualizing the expression elements illustrated in FIG. 9, a plurality of expression elements are extracted from the comment data collected for the object selected by the user (S910), and the extracted expression elements are compared with the existing expression elements (S1110). ). Thereafter, whether a new expression element is added among the extracted expression elements is checked (S1120), and the extracted plurality of expression elements are visualized based on a distribution based on a semantic distance between the extracted plurality of expression elements (S920). can do. At this time, the process of finding the meaning of the new expression element may be performed through a technique such as context-based analysis.

In FIG. 11, it is determined whether a new expression element is added (S921), and when a new expression element is added, one or more adjacent expression elements having a meaning distance and a new expression element among existing expression elements within a predetermined criterion. Determine (S922).

In this case, the predetermined criterion may be based on the N nearest to the new expression elements among the existing expression elements, or the meaning distance between the new expression elements and the existing expression elements is within r. It may be based on cases.

Subsequently, the semantic position of the new expression element is determined based on the determined semantic distances from one or more adjacent expression elements (S923), and the new expression element having the determined position is visualized (S924).

In this case, as the meanings of the new expression elements and the adjacent expression elements are similar, the positions may be determined by assigning weights such that the meaning distance between the new expression elements and the adjacent expression elements becomes closer. That is, when the first neighboring expression element is similar to the meaning of a new expression element than the second neighboring expression element, the distance between the new expression element and the first neighboring expression element is the distance between the new expression element and the second neighboring expression element. The position of the new presentation element can be determined to be shorter. In this case, the semantic similarity between the expression elements may be obtained through context-based analysis, or may be obtained using various methods such as a questionnaire survey for a large number of people.

In the method of visualizing the expression element illustrated in FIG. 9, a plurality of expression elements are extracted from the comment data collected for the object selected by the user (S910), and the validity of the extracted expression element is determined (S1310). In this case, when the extracted expression element is not valid for the object selected by the user, the invalid expression element is removed (S1320).

Thereafter, based on the distribution based on the semantic distance between the extracted plurality of expression elements, a plurality of expression elements from which invalid expression elements are removed from among the plurality of extracted expression elements may be visualized (S920). As a criterion for determining that an expression element is not valid, the meaning of the expression element is significantly different from other expression elements, the frequency of the expression element is markedly less than the reference value, or the expression element expresses specific content. This may include a case where a plurality of contents appear at a constant rate without discrimination (in this case, they may be mechanically repeated promotions or announcements rather than true reviews).

In the method of visualizing the expression element described in FIG. 13, a plurality of expression elements are extracted from the comment data collected for the object selected by the user (S910), and the frequency extracted in the comment data of the extracted expression elements is measured ( S1410).

Thereafter, the validity of the extracted expression element is determined using the frequency of the expression element (S1310). In this case, when the extracted expression element is not valid for the object selected by the user, the invalid expression element is removed (S1320).

In the method of visualizing the expression element illustrated in FIG. 9, a plurality of expression elements are extracted from the comment data collected for the object selected by the user (S910), and the frequency at which the extracted expression element is extracted from an object other than the object selected by the user. Identify (S1510). Then, it is determined whether the extracted expression element is extracted more than a certain frequency from a certain number of objects other than the object selected by the user (S1520), and for the expression element extracted more than a certain frequency from a certain number of objects. The weight is adjusted (S1530). Thereafter, the plurality of extracted expression elements are visualized based on the distribution based on the semantic distance between the extracted expression elements (S920).

Accordingly, if the presentation elements appear equally without discriminating with respect to all objects (contents), they can be regarded as invalid.

In the method of visualizing the expression element illustrated in FIG. 9, a plurality of expression elements are extracted from the comment data collected for the object selected by the user (S910), and the frequency at which the expression element is extracted is measured (S1610). Thereafter, the weighted frequency is weighted according to the measured frequency, and the measured frequency is adjusted (S1620). In this case, the expression elements may be visualized by reflecting the adjusted frequency (S920).

Accordingly, when a specific expression element is excessively concentrated in a specific content, the influence of the specific expression element may be adjusted by adjusting the weight. That is, when a certain expression element is excessively concentrated, the influence of other expression elements may be excessively underestimated, thereby adjusting the influence of the specific expression element.

In the method of visualizing the expression element illustrated in FIG. 9, a plurality of expression elements are extracted from the comment data collected for the object selected by the user (S910), and the frequency at which the expression element is extracted from the comment data is measured (S1710). Then, by comparing the frequency in which the expression element appears in the object selected by the user and the measured frequency (S1720), the expression element is extracted according to a result of comparing the frequency of occurrence of the expression element in the object selected by the user with the measured frequency. The measured frequency is adjusted by weighting the frequency (S1730).

Accordingly, when the frequency of appearance of the comment data is low by comparing the frequency of appearance of the expression data with the frequency of appearance of the specific object (content / movie) and the comment data, a low weight may be given.

When extracting a plurality of expression elements from the comment data collected for the object selected by the user in the method of visualizing the expression element shown in FIG. If the extracted expression element is not stored in the database, the standardized expression element on the database closest to the extracted expression element is identified as the representative expression element of the extracted expression element. (S912).

Then, when measuring the frequency extracted in the comment data of the extracted expression elements (S930), the frequency represented by the extracted expression element is extracted in the comment data, summed up to the frequency that the identified representative expression element is extracted in the comment data Then, when visualizing the plurality of extracted expression elements based on the distribution based on the semantic distance between the extracted expression elements (S920), the representative expression elements are visualized by reflecting the summed frequency.

Accordingly, when the expression element is not a standard type, it is mapped to the expression element of the standard type stored in the emotional vocabulary dictionary (in a pre-stored database), and the comment data of each object is based on the expression element of the standard type in the mapped emotional vocabulary dictionary. Frequency can be measured.

The system 1900 for visualizing an expression element may be, for example, a computing system and includes a storage device 1910 and a processor 1920. In this case, the processor 1920 may include an expression element extractor 1930, a frequency measurer 1940, a validity determiner 1950, and a visualization unit 1960.

The storage device 1910 stores comment data on the object selected by the user, and the expression element extractor 1930 extracts a plurality of expression elements from the comment data stored in the storage device 1910 to the visualization unit 1960. Visualize the plurality of extracted presentation elements based on the distribution based on the semantic distance between the extracted plurality of presentation elements.

In addition, when measuring the frequency extracted in the comment data of the expression elements extracted by the frequency measuring unit 1940, the visualization unit 1960 may visualize the extracted expression elements according to the frequency of the measured expression elements, The validity determination unit 1950 may determine the validity of the extracted expression element, and if the extracted expression element is not valid for the object selected by the user, the invalid expression element may be removed.

In addition, the frequency measuring unit 1940 may identify a frequency at which the expression element extracted by the expression element extractor 1930 is extracted from an object other than the object selected by the user, so that the extracted expression element is different from the object selected by the user. It may be determined whether the object is extracted with a predetermined frequency or more from a predetermined number or more of objects.

In addition, the expression elements include words, paragraphs, emoticons, etc., which represent the emotions of people extracted from the comment data, and the multi-dimensional scale analysis described with reference to FIGS. 1 to 4 based on a distribution based on the semantic distance between the plurality of expression elements. Based on a multi-dimensional scaling map (MDS map), it can be visualized in the form of a heat map or a contour.

The system 1900 for identifying new representations and visualizing the representations includes a storage device 1910 and a processor 1920. In this case, the processor 1920 includes an expression element extraction unit 1930, an expression element comparison unit 1970, a new expression element checking unit 1980, and a visualization unit 1960.

In this case, the expression element comparator 1970 compares the expression elements extracted by the expression element extraction unit 1930 with the existing expression elements, and the new expression element checking unit 1980 is new among the extracted expression elements. Checks whether a presentation element has been added.

The system 1900 for visualizing a representation includes a storage device 1910 and a processor 1920. In this case, the processor 1920 includes an expression element extraction unit 1930, an expression element comparison unit 1970, a new expression element checking unit 1980, and a visualization unit 1960.

The frequency measurer 1940 identifies a frequency at which the expression element extracted by the expression element extractor 1930 is extracted from an object other than the object selected by the user, and the extracted expression element is selected from objects other than the object selected by the user. It may be determined whether or not the object is extracted more than a certain frequency from a certain number of objects.

In this case, the frequency adjusting unit 1990 may adjust the frequency of the expression element by assigning a weight to the frequency of the expression element according to the frequency of the expression element according to the measured frequency of the expression element.

In this case, the visualization unit 1960 may visualize the expression elements by reflecting the frequency adjusted by the frequency adjustment unit 1990.

Accordingly, even when a particular expression element appears excessively concentrated in a specific content, the weight may be lowered to control the influence of the expression element.

In addition, the frequency measuring unit 1940 may measure the frequency at which the expression element is extracted from the stored comment data, and compare the frequency with which the expression element appears in the object selected by the user and the identified frequency. In this case, the frequency adjusting unit 1990 According to a result of comparing the frequency of occurrence of the expression element and the identified frequency in the object selected by the user, the frequency of extracting the expression element may be adjusted by weighting the frequency from which the expression element is extracted.

The expression element extractor 1930 illustrated in FIGS. 19 to 21 includes an expression element search unit 1931 and an expression element identification unit 1932.

The expression element searching unit 1931 searches whether the extracted expression element is stored in the database in which the standardized expression element is stored in advance, and the expression element identification unit 1932 determines whether the extracted expression element is not stored in the database. In this case, the standardized presentation elements in the database that are closest to the extracted presentation elements are identified as representative presentation elements of the extracted presentation elements.

In this case, the frequency measuring unit 1940 adds the extracted frequency of the extracted expression elements in the comment data to the frequency in which the identified representative expression elements are extracted in the comment data, and the visualization unit 1960 adds up the representative expression elements. Visualize the reflected frequency.

Accordingly, when the expression element is not a standard type, it is mapped to the expression element of the standard type stored in the emotional vocabulary dictionary (in a pre-stored database), and in the comment data of each object based on the expression element of the standard type in the mapped emotional vocabulary dictionary. The frequency of can be measured.

23 is a visualization graph in addition to a heat map form according to an embodiment of the present invention, which is a graph showing the present invention in the form of a scatter plot. In this case, as the frequency increases, the color becomes more red according to the frequency of the expression. Can be. 24 is a graph in the form of Small Multiples.

FIG. 25 is a diagram illustrating the present invention in the form of contour lines. In this case, the higher the frequency according to the frequency of the expression vocabulary, the higher the value. FIG. 26 is a diagram of Choropleth Maps. In this case, the present invention may be embodied in a natural topography or a natural shape such as a part of a map, not necessarily in a rectangular form.

FIG. 27 is a diagram illustrating the present invention with statistical maps. In this case, when each expression vocabulary is generated according to the city, province, or group according to the comment data or the opinion selected by the user, the expression vocabulary with high frequency is presented to the user. In addition, the high frequency expression vocabulary selected by the user may be displayed on the map.

Heat-map visualization described in the present invention is shown on a two-dimensional plane, but can be transformed into a three-dimensional form such as three-dimensional while maintaining the same properties.

The deformation of the dimension, the angle, the size of the pixel, and the color may be adjusted according to the frequency of the expression elements, and FIG. 28 is a diagram illustrating a heat-map form described in the present invention in a three-dimensional form.

29 is a diagram illustrating two-dimensional contour lines according to the frequency of the expression vocabulary, and the color and size of the contour line may be adjusted according to the frequency of the expression vocabulary.

FIG. 30 is a view showing the contour map shown in FIG. 29 in three dimensions.

FIG. 30 is a diagram illustrating the two-dimensional contour map illustrated in FIG. 29 as three-dimensional contour lines according to the frequency of the expression vocabulary. The color, height, and size of the contour line may be adjusted according to the frequency of the expression vocabulary.

FIG. 31 illustrates an embodiment utilizing positioning of a multi-dimensional scaling map (MDS map) used in the present invention, and MDS map positioning is based on conventional positioning using four axes. In comparison, multi-dimensional positioning is possible based on various properties displayed on the MDS map.

For example, in the second quadrant, Audi is located in the same quadrant as BMW, but closer to the future-oriented image. Similarly, in the quadrant 4, the SM is closer to the relaxed image than the KIA.

Positioning using the MDS Map can be used for image positioning of goods, people and characters as shown in Figs.

The above-described embodiments of the present invention have been described based on expression elements (elements including vocabulary, emoticons, emotions, evaluations, and opinions) extracted from comment (review) data for one object (content). However, the spirit of the present invention is not limited to the case in which the expression element of the comment data for one object is intuitively expressed in one emotion map.

That is, according to another embodiment of the present invention, an edit menu by a user or a menu providing a comparative analysis function of two or more objects may be provided. In this case, the user may select the first object and the second object to compare the presentation elements in the review for the first object with the presentation elements in the review for the second object. In this case, the comparison menu may perform a set operation (set, intersection, difference) between the expression elements in the review for the first object and the expression elements in the review for the second object, and compare the two sets. In addition, a re-draw menu may be provided to re-visualize the union or subset.

In addition, only one visualization data exists for the same object, and there may be visualization versions according to two or more time versions (or time layers) according to time series versioning, and the position and properties of nodes change over time. You can also track. At this time, the attributes of each node (expression element) according to time may be represented by area, color, etc., and may reflect frequency, concentration, and the like. For example, a heat map may be an example of the same as described above.

Expression element visualization method according to an embodiment of the present invention is implemented in the form of program instructions that can be executed by various computer means may be recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

In the present invention as described above has been described by the specific embodiments, such as specific components and limited embodiments and drawings, but this is provided to help a more general understanding of the present invention, the present invention is not limited to the above embodiments. For those skilled in the art, various modifications and variations are possible from these descriptions.

Therefore, the spirit of the present invention should not be limited to the described embodiments, and all of the equivalents and equivalents of the claims, as well as the following claims, will fall within the scope of the present invention. .

The present invention relates to a visualization method and system using expression elements collected from object data. For example, if an object is one piece of content, the consumer who consumes the content is in a comment expressing feelings or opinions about the object. It relates to a technique for visualizing the presentation elements that appear.

The present invention is to visually implement the expression element expressing the user's emotion or opinion expressed in the user's comments on the object, as well as the objective information such as the manufacturer, price, etc. provided by the existing object information, as well as the emotion that the user feels by using the object. By analyzing the expression elements expressing, or opinions, information that can be used as a basis for object selection can be provided to a user who wants to use the object newly.

Claims

Extracting a plurality of presentation elements from the comment data collected for the object selected by the user; And

Visualizing the extracted plurality of presentation elements based on a distribution based on a semantic distance between the extracted plurality of presentation elements;

Data visualization method comprising a.
The method of claim 1,

Measuring a frequency extracted in the comment data of the extracted expression elements;

More,

Visualizing the presentation elements

And visualize the extracted expression elements according to the frequency of the measured expression elements.
The method of claim 1,

After extracting the presentation elements,

Comparing the extracted expression elements with previously extracted expression elements; And

Checking whether a new expression element of the extracted expression elements is added;

Data visualization method further comprising.
The method of claim 3,

Visualizing the presentation elements

When a new expression element of the extracted expression elements is added, determining one or more adjacent expression elements whose semantic distance from the existing expression elements is within a predetermined criterion;

Determining a semantically position of the new representation element based on the determined semantic distance from the one or more adjacent representation elements;

Data visualization method comprising a.
The method of claim 1,

After extracting the presentation elements,

Determining the validity of the extracted expression element; And

If the extracted expression element is invalid for the object selected by the user, removing the invalid expression element;

Data visualization method further comprising.
The method of claim 5,

After extracting the presentation elements,

Measuring a frequency extracted in the comment data of the extracted expression elements;

More,

Determining the validity of the extracted expression element is

And determining the validity of the extracted expression element by reflecting the measured frequency of the extracted expression element.
The method of claim 1,

After extracting the presentation elements,

Identifying a frequency with which the extracted expression element is extracted from an object other than the object selected by the user; And

Determining whether the extracted expression element is extracted at a predetermined frequency or more from a predetermined number or more of objects other than the object selected by the user;

Data visualization method further comprising.
The method of claim 1,

After extracting the presentation elements,

Measuring a frequency at which the expression element is extracted; And

Adjusting the measured frequency by weighting the measured frequency according to the measured frequency;

More,

Visualizing the presentation elements

And visualizing the expression elements by reflecting the adjusted frequency.
The method of claim 1,

After extracting the presentation elements,

Measuring a frequency at which the expression element is extracted from the comment data;

Comparing the frequency of occurrence of the expression element in the object selected by the user with the measured frequency; And

Adjusting the measured frequency by weighting a frequency from which the expression element is extracted according to a result of comparing the frequency of appearance of the expression element with the measured frequency in the object selected by the user;

Data visualization method further comprising.
The method of claim 2,

Extracting the expression elements is

Searching whether the extracted expression element is stored in a database in which a standardized expression element is stored in advance; And

If the extracted expression element is not stored in the database, identifying a standardized expression element on the database closest to the extracted expression element as a representative expression element of the extracted expression element;

Including,

Measuring the frequency is

The frequency with which the extracted expression element is extracted in the comment data is added to the frequency with which the identified representative expression element is extracted in the comment data,

Visualizing the presentation elements

And visualize the representative expression element by reflecting the summed frequency.
The method of claim 2,

Visualizing the presentation elements,

And visualizing the presentation elements against a background of a multi-dimensional scaling map (MDS map) including the presentation elements.
A computer-readable recording medium in which a program for executing the method of any one of claims 1 to 11 is recorded.
A storage device storing comment data on an object selected by a user;

An expression element extracting unit extracting a plurality of expression elements from the stored comment data; And

A visualization unit for visualizing the extracted plurality of expression elements based on a distribution based on a semantic distance between the extracted plurality of expression elements;

Data visualization system comprising a.
The method of claim 13,

A frequency measuring unit measuring a frequency extracted in the comment data of the extracted expression elements;

More,

The visualization unit

And visualize the extracted expression elements according to the frequency of the measured expression elements.
The method of claim 13,

An expression element comparison unit comparing the extracted expression elements with previously extracted expression elements; And

A new expression element checking unit for checking whether a new expression element is added among the extracted expression elements;

Data visualization system comprising more.
The method of claim 13,

A validity determination unit that determines the validity of the extracted expression element and removes the invalid expression element when the extracted expression element is invalid for the object selected by the user;

Data visualization system comprising more.
The method of claim 16,

A frequency measuring unit measuring a frequency extracted in the comment data of the extracted expression elements;

More,

The validity determination unit

And determining the validity of the extracted expression element by reflecting the measured frequency of the extracted expression element.
The method of claim 13,

By identifying the frequency of the extracted expression element is extracted from an object other than the object selected by the user, whether the extracted expression element is extracted more than a certain frequency from a certain number of objects other than the object selected by the user A frequency measuring unit determining whether or not;

Data visualization system comprising more.
The method of claim 13,

A frequency adjusting unit for measuring a frequency of extracting the expression elements, and assigning a weight to the frequency of the expression elements according to the measured frequency to adjust the frequency of the expression elements;

More,

The visualization unit

And visualize the presentation elements by reflecting the adjusted frequency.
The method of claim 13,

A frequency measuring unit measuring a frequency at which the expression element is extracted from the stored comment data and comparing the frequency of occurrence of the expression element in the object selected by the user with the identified frequency; And

A frequency adjustment unit for adjusting the frequency at which the expression element is extracted by weighting the frequency at which the expression element is extracted according to a comparison result of the frequency of appearance of the expression element in the object selected by the user and the identified frequency;

Data visualization system comprising more.
The method of claim 14,

The expression element extraction unit

An expression element searching unit for searching whether the extracted expression element is stored in a database in which a standardized expression element is stored in advance; And

If the extracted expression element is not stored in the database, the representative expression element identification unit for identifying a standardized expression element on the database closest to the extracted expression element as a representative expression element of the extracted expression element ;

Including,

The frequency measuring unit

The frequency with which the extracted expression element is extracted in the comment data is added to the frequency with which the identified representative expression element is extracted in the comment data,

The visualization unit

And visualize the representative expression element by reflecting the summed frequency.