JP2007264718A - User interest analyzing device, method, and program - Google Patents

User interest analyzing device, method, and program Download PDF

Info

Publication number
JP2007264718A
JP2007264718A JP2006085174A JP2006085174A JP2007264718A JP 2007264718 A JP2007264718 A JP 2007264718A JP 2006085174 A JP2006085174 A JP 2006085174A JP 2006085174 A JP2006085174 A JP 2006085174A JP 2007264718 A JP2007264718 A JP 2007264718A
Authority
JP
Japan
Prior art keywords
user
word
file
words
influence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2006085174A
Other languages
Japanese (ja)
Inventor
Julian Brody
Masahiro Matsumura
ブローディ ジュリアン
真宏 松村
Original Assignee
Yafoo Japan Corp
ヤフー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yafoo Japan Corp, ヤフー株式会社 filed Critical Yafoo Japan Corp
Priority to JP2006085174A priority Critical patent/JP2007264718A/en
Publication of JP2007264718A publication Critical patent/JP2007264718A/en
Application status is Pending legal-status Critical

Links

Images

Abstract

An object of the present invention is to provide an algorithm for estimating in real time a user's changing interests from a word propagating between files viewed by the user, a device equipped with the algorithm, and the like.
A means for inputting words included in a plurality of files from a history viewed by a user as text for each file, a means for dividing the text into language units, and a user among a plurality of files viewed by the user. A means for extracting the referred “propagating word”, a means for storing one or a plurality of the “propagating words”, and a predetermined “influence” from the appearance frequencies of all the “propagating words” for all files. , A means for obtaining a predetermined iDF value representing the degree of occurrence of the “propagating word” in a specific file, and the “influence degree iDF value” which is a function of the “influence degree” and the iDF value. Means for extracting a set of interesting words as user profile information.
[Selection] Figure 1

Description

  The present invention relates to a user interest analysis device, a control method thereof, and a computer program for realizing the method.

  In recent years, so-called interactive media that users can participate in, including the Internet, have been increasing in type and quantity according to various needs. Among them, bulletin board media that can exchange opinions even with different personalities by transmitting information anonymously, displaying the written findings in chronological order, and exchanging information and exchanging opinions by making hyperlinks according to the findings The blog (WEB Log) media that can be used, and social media that is a community that can share and share friendships, hobbies and thoughts by joining, are particularly remarkable.

  On such media, attempts have been made to profile and categorize users and to find keywords that have a strong influence on users. This is because if users can evaluate so-called “attributes” such as user interests, preferences, needs, age, gender, region, occupation, values, etc., accurate content distribution and advertisement distribution to users (this) Can be called target distribution), and it has been pointed out that the exchange of opinions between users has an influence on purchasing decision making, so if we can find keywords that strongly influence users, It can be used for development and marketing strategies.

  Therefore, for example, in marketing sites, analysts read bulletin boards and blogs and participate in the community to pick up influential keywords. Since the standard is based on the analyst's experience and sensibility, it cannot be evaluated with a unified index, and analysis work in a vast amount of media requires a lot of resources.

  Attempts to profile and categorize users include surveys. However, collecting a sufficient amount of samples requires a lot of effort, and for accurate targeted distribution after the surveys are collected, Since it can be traced only by associating with unique information or when the user is logged in, it can be applied only within a specific medium and cannot be used for general purposes. In addition, it is pointed out that the contents are inaccurate due to various factors.

  In addition, as an attempt to automatically profile and categorize a user, a technique for analyzing a user's information browsing history and search condition input and holding information on the user's preference and interest as a user profile is known (for example, Patent Document 1).

  However, the above methods make vocabulary frequency an important factor in the process of attribute evaluation and keyword determination. For example, in media with high anonymity, such as bulletin boards, the opinions are polarized and slandered. It has been pointed out that frequent vocabulary does not necessarily have an influence due to specific content. Also, including blog media and social media, the frequency is high, not necessarily influential vocabulary that appears in the central topic, but in vocabulary and general vocabulary that appears frequently in peripheral topics Because there are many cases, it is difficult to extract truly influential keywords and accurately profile and categorize users.

  Therefore, a model for quantitatively finding influential keywords is proposed by focusing on the process of interest in characters, that is, vocabulary, in communication using text information, which is the main component of interactive media. (Non-Patent Document 1). In this model representing the strength of contextual domination, that is, the spread of influence, we define the amount of mediation influence on text content and vocabulary, and measure this to measure keywords that have high influence even if they are infrequent. It can be extracted.

  In addition, an algorithm for profiling and categorizing a user by defining a feature derived from a set of influential keywords extracted as described above as the user's profile for each user on such media is proposed. (Non-Patent Document 2).

JP 2003-67410 A MATSUMURA, Masahiro et al .: Dissemination model of influence in communication by text, JSAI Journal 17-3 SP-B, P259-267, 2002 MATSUMURA Masahiro et al. Profiling Online Community Participants Based on Dissemination Models of Impact, JSAI Proceedings Vol.18, No.4, A, P165-172, 2003

  However, in any of these proposals, since the propagation direction and history cannot be utilized effectively, the user's changing interest cannot be estimated in real time. That is, as a feature common to such interactive media, the technology that enables response, comment, link, and trackback enables users to describe, exchange, and refer to opinions and information. However, the direction and history of propagation can be defined by defining nodes and viewing order for time-series information of description, exchange, and reference of opinions and information enabled by such technology. It is possible to define a directed link that makes effective use of. In addition, since the user browses a file (for example, a WEB page) according to his / her interest, the characteristic words that are consistently included in the file set browsed by the user reflect the user's interest at that time in real time. .

  Therefore, in the present invention, in the directed graph in which the file browsed by the user is a node and the browsing order is a directed link, the appearance frequency of words propagating between the nodes is recursively measured, We propose an algorithm that estimates a user's changing interests in real time by extracting a set, and a device, method, and program that implement the algorithm.

  (1) A user interest analysis device that extracts words of interest of a user who browses a file, and uses a history information browsed by the user as a text for a plurality of words included in the file browsed by the user. One or a plurality of means, a means for dividing morphemes into the smallest meaningful language units from the text, a means for extracting "propagating words" referenced by the user among a plurality of files viewed by the user, A means for storing the propagating word and a frequency of appearance of the propagating word as a target file to obtain a predetermined “influence” and a predetermined iDF value representing a degree of the propagating word appearing in a specific file. Meaning and extraction of user profile information as user profile information according to the value of “influence degree iDF value” which is a function of the influence degree and the iDF value It means that provides a user interested analyzer and means for outputting the user profile information.

  According to the invention of (1), first, a word further referred to by a user through a link or the like is extracted from a history of a file browsed on the Internet by the user as a word to be propagated on the file. Next, the degree of influence on the file referenced after the propagating word is quantified. Further, an iDF (Inverse Document Frequency) value is calculated so that the propagated word appears in all files, and the user is interested in the value of the influence iDF value that is a function of the influence and the iDF value. Detect a word. Further, a specific set of detected words is output as user profile information together with the influence iDF value. By providing the above functions, it is possible to provide a user interest analysis device that can analyze in real time a word of interest of a changing user.

  In addition, referring to the profile information output by this user interest analysis device, it can be used for sales strategies for products related to the words that the user is interested in, deliver content and advertisements, and direct mail etc. efficiently Can be sent to that user.

  (2) The user interest analysis device according to (1), further comprising means for disclosing the profile information to other users.

  According to the invention of (2), in the community on the Internet, by finding out the words that other users are interested in, it is possible to find users who have an interest in common with them (search for friends) and users who are likely to be familiar with the field. You can find and ask questions (search for masters).

  (3) A similar word dictionary for detecting a word related to the propagating word is further included, and means for calculating the influence iDF value also for the word related to the propagating word is provided. Or the user interest analysis apparatus as described in (2).

  (4) The user interest analysis device according to (1) to (3), wherein the influence iDF value is obtained by a predetermined mathematical formula (described later).

  Moreover, the apparatus provided with invention of said (1)-(4) is realizable also by the computer program which makes a computer perform the equivalent control method and the control method.

  According to the present invention, in a directed graph with a file viewed by a user as a node and a browsing order as a directed link, recursively measure a value that takes into account the influence and appearance frequency of words propagating between nodes, By extracting a set of words having higher values, the user's changing interest can be estimated in real time.

  Hereinafter, embodiments of the present invention will be described with reference to the drawings.

  FIG. 1 shows an example of a functional block diagram of a user interest analysis apparatus according to the present invention. As shown in the figure, the user interest analysis apparatus 10 includes a file text input means 2, a morpheme division means 3 (not essential), a propagation word extraction means 4, an influence degree calculation means 5, an iDF value calculation means 6, a temporary data The storage means 7 used for storage, the user interest word extraction means 8, the profile information output means 9, and the synonym dictionary 11 are comprised. However, this configuration is merely an example, and another configuration having an equivalent function may be taken.

  First, the user interest analysis apparatus 10 receives the user's file browsing history 1 as input, and extracts text for each page by the file text input means 2. The file browsing history generally exists in a temporary storage file of an Internet browser, but may be browsing history information of a bulletin board or a blog.

  Next, if the extracted text is composed of sentences, the morpheme dividing means 3 divides the sentences into necessary units. There are cases where the process of the morpheme dividing means 3 is skipped, for example, when the metadata in the page is used or when the extracted page is composed of only words. Next, the propagation word extracting means 4 analyzes a file group referred to by a user for a certain period or a necessary portion thereof, and extracts a common word or a word to be propagated in the page. A common word refers to a keyword that appears in common in each file. However, as will be described later, common words do not necessarily need to be completely matched in each file, but include partially matching words and synonyms.

  In addition, the word to be propagated means a word (trigger) or an influence that the user refers to the next file from a certain file. Propagating words do not need to be completely matched on each page, and include partially matching words and synonyms. Synonyms are defined using a known thesaurus (synonym dictionary). Note that the word to be propagated will be described in more detail in an example described later.

  Next, with respect to each of one or a plurality of propagated words extracted by the influence degree calculating means 5 and the iDF value calculating means 6, the influence degree indicating the influence of the propagation and the appearance frequency (number of files) of the propagated words. ) Is an iDF (Inverse Document Frequency) value. The influence degree of propagation is an amount representing the influence (weight) of a word to be propagated on a file referred to later. For example, the definition of TF value (Term Frequency) can be applied. The TF value generally represents the frequency of occurrence of the target word in the target document. In the present invention, the document targets a file group viewed by the user or a necessary portion thereof. Hereinafter, the degree of influence EDT (Effect of Diffusible Term) is referred to.

The iDF value is a function of the frequency at which the target word appears in the target document, and is generally defined as a function that decreases as the frequency increases. Hereinafter, the product of the degree of influence described above and this iDF value is referred to as an “influence degree iDF value”. The degree of influence iDF is G. A general formula of TFiDF proposed by Salton (G. Salton, M. McGill, Introduction to Modern Information Retrieval, New York, McGraw-Hill, 1983), or a modified expression using a phrase that is a convenient phrase. It suffices to have features that focus on propagation and define the degree of influence. For example, as an example of the embodiment of the present invention, calculation is performed using the following mathematical formula.

here,
t is the word to propagate,
EDT is the frequency at which the propagated word appears in the file group that the user browsed during a predetermined time,
N is the total number of files viewed by the user during a given time,
DF (t) is the number of files containing the word t to propagate,
Represents.
The predetermined time refers to a target period analyzed by the user analysis device, and can be determined individually according to the analysis target and needs. For example, it may be several hours or months.

  In the above example, the definition for the general TF value is used as the influence degree, but the influence degree may be defined by another method. In the iDF (t) equation, the logarithm may not be used. However, when the logarithm is used, it is possible to use a natural logarithm e, 2 or the like in addition to using 10 as the base of the log. Accordingly, each of the influence degree calculating means 5 and the iDF value calculating means 6 can be selected from a plurality of mathematical expressions, and therefore includes a plurality of corresponding means. In FIG. 1, these are represented by 5a, 5b, 6a and 6b.

  Further, using the calculated influence degree and iDF value, the user interest word extraction unit 8 obtains the influence degree iDF value for each of the previously transmitted words, and the user is interested according to this value. Extracted words. For example, words having a large influence degree iDF value can be extracted as words that the user is interested in.

  Finally, the profile information output means 9 compares the threshold value with the influence degree iDF value determined in advance, and outputs the profile of the user.

  FIG. 2 shows a user interest analysis apparatus according to another embodiment. 2 is realized by a general computer system. That is, it is composed of a CPU 21, an input unit 22, an output unit 23, a communication unit 24, a program memory 25, a work memory 26, and a user profile 27. Further, the above-mentioned synonym dictionary 11 may be added as an option.

The input unit 22 may be a general input device such as a mouse or a keyboard that receives an operation input from a user, and the output unit 23 may be a display such as a liquid crystal display or a CRT. The communication unit 24 transmits / receives data to / from a LAN or Internet communication network.
The program memory 25 stores a program having each function of the apparatus executed by the CPU 21. That is, programs such as a control unit for the entire apparatus, a keyword extraction unit for extracting a keyword from an input file, an influence iDF value calculation unit for obtaining an influence iDF value with a predetermined algorithm, and a profile creation unit are stored. The program does not need to be divided according to function, and may be configured by a single program.

  The program memory 25 may be a ROM or a flash memory, or a RAM loaded from a hard disk (HDD). The work memory 26 temporarily stores intermediate data processed by the CPU 21 and is generally composed of a RAM or an HDD.

  The user profile 27 is a storage unit that stores a result of executing a program stored in the program memory 25. Further, as already described, the synonym dictionary 11 is a dictionary for defining synonyms for a word group extracted from the text, and is referred to as necessary by the keyword extraction unit.

  FIG. 3 is a diagram showing the concept of the above-mentioned propagating word. In this figure, as an example, a history of pages viewed by a certain user is shown. First, the user found an interesting word in page 1. In order to examine it in more detail, the user enters the word into the search page and searches the file, or if a hyperlink is made directly from the word, clicks this to browse page 2. Of course, it may be a transition means other than search and hyperlink. Similarly, the page 3 is moved from the page 2 to the page 3, and the page 3 is browsed. However, since there is no description of an interesting word, the page 2 is returned to, and another page 4 is browsed from the page 2, and further from the page 4. Repeated browsing to another page 5.

  The browsing history of such a page can be expressed by a directed graph in which the viewed page is a node and the browsing order from the page to the page is a directed link (edge). A directed graph is a graph in which the edge between nodes has directionality.

  As shown in the figure, the pages 1 to 5 viewed by the user all include the “common word” 36, but the word that the user is interested in is not necessarily the common word 36, and the search bar is used. This is often the word searched for and the word listed on the hyperlink. In this way, a word that has triggered or jumped between pages is called a “propagating word” (shown as a propagating word 37 in the figure). In other words, it is considered that the word to be propagated represents the user's interest in real time rather than the word that happens to appear in common between pages (common word). On the other hand, even if only common words are extracted in order of frequency, common nouns such as “product”, “Internet”, “do”, “is” are extracted as higher frequent words. And so on, and it is not easy for the user to find a really interesting word (influential word). Therefore, in the present invention, the word propagating between the files most directly indicates the user's interest, and the process of changing interest in real time by adjusting the upper limit of the number of files to be referred to or adjusting the target period of analysis is performed in real time. Note that you can control as shown.

  FIG. 4 shows an example in which a user's words of interest in file browsing change. First, the user learns from the new product news 41 (page 1) that company X has released the product A, which is the latest model of a liquid crystal TV, as a new product. A user who has been interested in a liquid crystal TV for a long time immediately goes to the product information site 42 (page 2) of Company X and sees information about the product A. Here, as the user looks at the details of the product A information, he / she is urged to compare it with similar products from other companies, and the price comparison site 43 (page 3) lists product lists of multiple manufacturers of LCD TVs. Is displayed. Here, the user became interested in the product B of Y company which has the same function as the new product A of X company and is cheaper than the product A. Therefore, the user jumps to the product information site 44 (page 4) of company Y and sees information about product B. Therefore, a user who knows that product B is a successor of product C, but that the price is considerably higher than product C, is now interested in product C, and the product is on product information site 45 (page 5) of the same company Y. Browse C information. The user who is more interested in the product C returns to the price comparison site 46 (page 6) again and finds the shop Z selling at the lowest price, in order to find the cheapest available store. The user who moved to the shop Z site 47 (page 7) finally decided to purchase, and ordered the product C from the purchase page 48 (page 8).

  Based on the above assumption, when all the texts included in page 1 to page 8 followed by the user are analyzed using the user interest analysis device, the keywords are “liquid crystal TV”, “Company X”, “product A”. , “Company Y”, “Product B”, and “Product C” are extracted. Here, the term “liquid crystal TV” appears in common on all pages, but the terms “product A”, “product B”, and “product C” explain the specifications on the manufacturer site of the product. It is assumed that many appear in the page. For example, as shown in the figure, “Liquid Crystal TV” appears once on each page, and “Product B” and “Product C” appear on the product specification page of Company Y five times each. In this example, since the user does not refer to the specification of the product A, the number of times “product A” appears is once for each of the page 1, the page 2, and the page 3. On the other hand, the user was initially interested in product A, but gradually moved to product B and product C, and finally ordered product C, so the number of times “product C” appeared was 1 time, once for page 4, 5 times for page 5, and once for pages 6-8.

  FIG. 5 shows an example in which the influence degree iDF value of each keyword is actually calculated in the example of FIG. Here, the influence degree iDF value was calculated using the above-described mathematical formula (1). The total number of pages N referred to by the user this time is 8. For example, since “product C” has appeared 6 times from page 3 to page 8, the DF value is 6. In addition, since “Product C” appears five times on page 5 and on pages 3, 4, 6, 7, and 8, respectively, the degree of influence is 5 + 1 + 1 + 1 + 1 + 1 = 10. Therefore, the influence degree iDF value is 10 * 8 / (log (6 + 1)) = 94.7. Similarly, when the influence iDF values are obtained for other keywords and arranged in descending order, the table shown in FIG. 5 is obtained.

  As can be seen from this table, “Liquid Crystal TV” is a common word that appears on all pages, but the impact iDF value is low, and “Product C” and “Product B” show much more interest to users. It is understood that In this way, a user profile can be created by collecting a set of high-order words of influence degree iDF values. Here, a threshold value or the like given in advance may be used to select words higher in the influence degree iDF value.

  FIG. 6 shows a specific example in which a user's words of interest on an Internet bulletin board change. In this figure, the following message exchanges between A, B, C, and D are shown.

  Mr. A decides to go on a trip nearby, and in order to find an accommodation at his destination, “Tell me about a recommended hotel, etc. I would like to travel around Hakodate in 3 days and 4 nights” (61 ) Was posted on the bulletin board. On the other hand, Mr. B responded that “X Hotel is recommended for Hakodate. It is beautiful and the price is reasonable” (62). In addition, Mr. C responded, “A-san likes hot springs. If you go to Hakodate, there are good hot spring inns” (63). Mr. A immediately thanked Mr. B and Mr. C and responded to Mr. C, “Thank you Mr. C. I love hot springs” (64). Therefore, Mr. C introduced Y inn and Z inn (65). On the other hand, Mr. D, who was watching the exchange, said, “If you go to Hakodate, why don't you go to Jozankei? Recommended here” (66). Rather, I introduced Q Hotel and R Ryokan in Jozankei with links on the website. Mr. A, who saw this, thought that it would not be bad to go from Hakodate to Jozankei, and responded, “I will immediately check the hot spring inn in Jozankei” (67). After that, Mr. A checked the homepage (68, 69) of Q Hotel and R Ryokan of URL introduced by Mr. D, and finally made a reservation at R Ryokan. Hereinafter, the comments 61 to 69 (including Q hotel and R inn homepages) will be referred to as pages 1 to 9.

  The main keywords appearing in this exchange are “Hakodate”, “Travel”, “Recommended”, and “Hotel” on page 1. On page 2, keywords such as “Hakodate”, “Hotel”, “Recommended”, “Beautiful”, “Price”, “Reasonable”, and the like. Similarly, keywords from page 3 to page 8 are extracted and arranged in descending order of influence iDF value except words that are difficult to be keywords, such as conjunctions and particles. This is shown in the table of FIG.

  At first, Mr. A wrote a comment on the bulletin board with the intention of finding a recommended hotel in Hakodate, but after seeing Mr. C's comment, he became interested in the hot spring, and then Mr. D wrote the hot spring inn. The link to the homepage became the decisive factor, and it was decided to book a hot spring inn in Jozankei that was quite far from Hakodate, the original destination.

  As can be seen from this example, it is clear that the word that has an important influence on the behavior (interest) of Mr. A is “hot spring”. Also from the table of FIG. 7, the influence degree iDF value of “hot spring” is located at the top, which can be read. It can also be seen that “Hakodate” and “Travel”, which Mr. A was initially interested in, were located at the lower level and their interest gradually faded away.

  As described above, by using the user interest analysis apparatus of the present invention to analyze the pages browsed by the user for a predetermined period in time series, the user's interests can be examined in real time. And if you can find a word that has a great influence on the user's interest (in the above example, "hot spring"), you can collect a lot of such information and use it as a product planning and marketing tool. .

  FIG. 8 is a diagram illustrating an example of calculating the influence iDF value in consideration of the synonyms in the example of the bulletin board in FIG. 6. That is, “hotel” and “ryokan” are defined as synonyms, and both are considered as one word, and “hotel / ryokan” is compared with the other top three words in FIG. Here, since “hotel” and “ryokan” are treated as one word, the total appearance frequency increases and the influence iDF value also increases. Therefore, there is no doubt that the user's interest is in the “inn” whether it is an inn or a hotel. However, the impact iDF value of “Hotel / Ryokan” does not reach the value of “Onsen”. The purpose of the user interest analyzer is to find such powerful words. Accordingly, a formula for obtaining the optimum influence degree iDF value can be selected from a plurality of formulas.

  FIG. 9 is a diagram illustrating another application example of the user interest analysis apparatus. Each terminal (91 to 93) of user A, user B, and user C is provided with a user interest analysis device, and each user publishes his / her profile, which is the output of the user interest analysis device, via the Internet 94. Suppose you agree. Of course, if there is information that you want to keep private in the profile, you may be able to make it public in a form that excludes it, or make it open only to member users, not to unspecified number of people Also good. This public user profile information is accumulated in the public profile DB 96 of the profile server 95. In the public profile DB 96, user profile tables A, B, and C (97 to 99) are created for each user. In the profile table, the words of interest of each user are listed along with their ranks. By making this public, it can be used as a tool for forming various communities.

  For example, when the user A is interested in “fishing”, a user having the same hobby can be searched from the public profile DB 96. That is, in this case, the user interest analysis device is a “friend search” tool. In this example, since the word related to “fishing” is higher in the profile table C99 of the user C, the user A knows that the user C is a person with the same hobby, and may make a direct contact. Since the user A also knows the words that the user C is interested in, it can be expected that the words will bounce greatly.

  In addition, the public profile DB 96 can refer to not only the ranking of the influence degree iDF value of the word of interest but also the total number of pages in which the word of interest appears, the EDT value, the page history period, and the like. In this case, it is possible to determine how large (volume) the interesting word forms the user's profile. For example, if the total number of pages on which “fishing” or its synonyms appear (pages browsed by the user C in a predetermined period) is extremely large, the user C is a person who likes fishing or is a master. It can be estimated that there is. In other words, the user interest analysis device can also be a “master search” tool.

  As described above, the user profile information obtained by the user interest analysis device of the present invention is not only useful as a product planning or marketing tool as the number of users who publish it increases, but also for finding friends for personal hobbies. It can also be used as a tool for searching for experts. Also, for users who don't want to share their profile with the general public, if they decide to share it only with their family members or other enthusiastic friends, when they want to give each other gifts, invite them to a trip, invite them to a meal, etc. As basic information, there is a possibility that it can be used for various purposes.

  As mentioned above, although this invention was demonstrated using embodiment and an Example, the technical scope of this invention is not restricted to said embodiment etc. above. Various variations or improvements can be added to the above embodiment.

  Note that the user interest analysis apparatus according to the embodiment of FIG. 1 or 2 of the present invention can also be realized by a program on a computer. The storage medium storing the program can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device). Examples of this storage medium include a semiconductor or solid state storage device, magnetic tape, removable computer readable media examples include a semiconductor or solid state storage device, magnetic tape, a removable floppy. Includes disks, random access memory (RAM), read only memory (ROM), rigid magnetic disks and optical disks. Current examples of optical disks include compact disk-read only memory (CD-ROM), compact disk-read / write (CD-R / W) and DVD.

It is a figure which shows the functional block in one Embodiment of the user interest analysis apparatus which concerns on this invention. It is a figure which shows the functional block in other embodiment of the user interest analysis apparatus which concerns on this invention. It is a figure which shows the concept of the word to propagate and the directed graph between pages based on this invention. It is a figure which shows the specific example of file browsing as Example 1 of this invention. It is a figure which shows the specific example of calculation of the influence degree iDF value in the Example of FIG. As Example 2 of this invention, it is a figure which shows the specific example from which the user's interesting word in a bulletin board changes. FIG. 7 is a diagram illustrating a specific example of calculation of an influence degree iDF value in the embodiment of FIG. 6. FIG. 7 is a diagram showing a specific example of calculation of an influence degree iDF value considering synonyms in the embodiment of FIG. 6. It is a figure which shows the profile server and profile table which can open | release a user profile to other users as Example 3 of this invention.

Explanation of symbols

DESCRIPTION OF SYMBOLS 1 File browsing history 2 File text input means 3 Morphological division means 4 Propagation word extraction means 5 Influence degree calculation means 5a, 5b Influence degree calculation means 6 iDF value calculation means 6a, 6b iDF value calculation means 7 Storage means 8 User interest word extraction Means 9 Profile information output means 10 User interest analysis device (first embodiment)
11 Synonym Dictionary 20 User Interest Analysis Device (Second Embodiment)
21 CPU
22 Input Unit 24 Output Unit 24 Communication Unit 25 Program Memory 26 Work Memory 27 User Profile 36 Common Words 37 Propagating Words 41 New Product News 42 Product Information Site 43 Price Comparison Site 44 Y Company Product Information Site (Product B Specification Page )
45 Y company product information site (product C specification page)
46 Price comparison site 47 Shop Z site 48 Purchase page 61-69 Page 1-9
91-93 User terminal 94 Internet 96 Profile server 97-99 Profile table

Claims (9)

  1. A user interest analysis device that extracts words of interest of a user browsing a file,
    Means for inputting, as text for each file, a plurality of words included in the file from the history of the file viewed by the user;
    Means for dividing the text into predetermined units;
    Means for extracting a propagated word referred to by a user among the plurality of files viewed by the user;
    Means for storing one or more of the propagated words;
    Means for obtaining a predetermined iDF value representing a predetermined influence degree and a degree of appearance of the propagating word in a specific file from the appearance frequencies of the propagating word with respect to the plurality of files;
    Means for extracting a set of words of interest of the user as user profile information according to an influence iDF value which is a function of the influence and the iDF value;
    Means for outputting the user profile information;
    A user interest analysis device comprising:
  2.   The user interest analysis device according to claim 1, further comprising means for disclosing the user profile information to other users.
  3. Further comprising a similar word dictionary for detecting a word related to the propagating word;
    The user interest analysis device according to claim 1, further comprising means for calculating the influence degree iDF value for a word related to the word to be propagated.
  4. The user interest analysis device according to claim 1, wherein the influence iDF value is obtained by the following mathematical formula.
    here,
    t is the word to propagate,
    EDT is the frequency at which the propagated word t appears in the file group viewed by the user,
    N is the number of files viewed by the user during a predetermined time,
    DF (t) is the number of files including the word t to be propagated.
  5. A user interest analysis method for extracting words of interest of a user browsing a file,
    Inputting a plurality of words included in the file from the history of the file viewed by the user as text for each file;
    Dividing the morpheme into predetermined units from the text;
    Extracting the propagating word referred to by the user among the plurality of files viewed by the user;
    Storing one or more of the propagating words;
    Obtaining a predetermined iDF value representing a predetermined influence degree and a degree of appearance of the propagating word in a specific file from the appearance frequencies of the propagating word with respect to the plurality of files;
    Extracting a set of words of interest of the user as user profile information in descending order of the influence iDF value, which is the product of the influence and the iDF value;
    Outputting the user profile information;
    A user interest analysis method including:
  6. There is a user interest analysis computer program that extracts words of interest of the user browsing the file,
    Inputting a plurality of words included in the file from the history of the file viewed by the user as text for each file;
    Dividing morphemes from the text into the smallest linguistic units having meaning;
    Extracting the propagating word referred to by the user among the plurality of files viewed by the user;
    Storing one or more of the propagating words;
    Obtaining a predetermined iDF value representing a predetermined degree of influence and a degree of occurrence of the word to be propagated in a specific file from the appearance frequencies of all the words to be propagated;
    Extracting a set of words of interest of the user as user profile information according to an influence iDF value that is a function of the influence and the iDF value;
    Outputting the user profile information;
    A computer program that causes a computer to execute.
  7.   The user interest analysis device according to claim 1, wherein the file is a WEB page.
  8.   The user interest analysis method according to claim 5, wherein the file is a WEB page.
  9. The computer program according to claim 6, wherein the file is a WEB page.
JP2006085174A 2006-03-27 2006-03-27 User interest analyzing device, method, and program Pending JP2007264718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2006085174A JP2007264718A (en) 2006-03-27 2006-03-27 User interest analyzing device, method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2006085174A JP2007264718A (en) 2006-03-27 2006-03-27 User interest analyzing device, method, and program

Publications (1)

Publication Number Publication Date
JP2007264718A true JP2007264718A (en) 2007-10-11

Family

ID=38637697

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2006085174A Pending JP2007264718A (en) 2006-03-27 2006-03-27 User interest analyzing device, method, and program

Country Status (1)

Country Link
JP (1) JP2007264718A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010128981A (en) * 2008-11-28 2010-06-10 Nippon Telegr & Teleph Corp <Ntt> Method, device and program for extracting operation sequence
JP2011146004A (en) * 2010-01-18 2011-07-28 Zigsow Kk User profiling system using web community site
US8095652B2 (en) 2008-02-29 2012-01-10 International Business Machines Corporation Analysis system, information processing apparatus, activity analysis method and program product
WO2012176317A1 (en) * 2011-06-23 2012-12-27 サイバーアイ・エンタテインメント株式会社 Image recognition system-equipped interest graph collection system using relationship search
JP2013105364A (en) * 2011-11-15 2013-05-30 Nippon Telegr & Teleph Corp <Ntt> Document feature extraction device, document feature extraction method, and document feature extraction program
WO2015190474A1 (en) * 2014-06-12 2015-12-17 Emotion Intelligence株式会社 Perk management system and perk management method
CN106506234A (en) * 2016-12-05 2017-03-15 深圳市彬讯科技有限公司 A kind of SOA services monitor in real time is reported and service performance metrics method
US10198426B2 (en) 2014-07-28 2019-02-05 International Business Machines Corporation Method, system, and computer program product for dividing a term with appropriate granularity

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CSNG200600523002, 臼井 大介, "確率的手法を用いたWebページ推薦システム", 情報処理学会研究報告 Vol.2006 No.27, 20060317, 第2006巻 第27号, 25〜32, JP, 社団法人情報処理学会 *
CSNJ200610068089, 吉田 博哉, "ユーザの嗜好に基づいたRSSニュースリーダに関する基礎研究", 第68回(平成18年)全国大会講演論文集(3) データベースとメディア ネットワーク, 20060307, 3−185〜3−186, JP, 社団法人情報処理学会 *
CSNJ200910008075, 松井 一樹, "電子人格:サイバースペースにおけるコミュニティ形成支援", 第55回(平成9年後期)全国大会講演論文集(4) インタフェース コンピュータと人間社会, 19970924, 4−147〜4−148, JP, 社団法人情報処理学会 *
JPN6010073577, 臼井 大介, "確率的手法を用いたWebページ推薦システム", 情報処理学会研究報告 Vol.2006 No.27, 20060317, 第2006巻 第27号, 25〜32, JP, 社団法人情報処理学会 *
JPN6010073578, 松井 一樹, "電子人格:サイバースペースにおけるコミュニティ形成支援", 第55回(平成9年後期)全国大会講演論文集(4) インタフェース コンピュータと人間社会, 19970924, 4−147〜4−148, JP, 社団法人情報処理学会 *
JPN6010073579, 吉田 博哉, "ユーザの嗜好に基づいたRSSニュースリーダに関する基礎研究", 第68回(平成18年)全国大会講演論文集(3) データベースとメディア ネットワーク, 20060307, 3−185〜3−186, JP, 社団法人情報処理学会 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8095652B2 (en) 2008-02-29 2012-01-10 International Business Machines Corporation Analysis system, information processing apparatus, activity analysis method and program product
JP2010128981A (en) * 2008-11-28 2010-06-10 Nippon Telegr & Teleph Corp <Ntt> Method, device and program for extracting operation sequence
JP2011146004A (en) * 2010-01-18 2011-07-28 Zigsow Kk User profiling system using web community site
WO2012176317A1 (en) * 2011-06-23 2012-12-27 サイバーアイ・エンタテインメント株式会社 Image recognition system-equipped interest graph collection system using relationship search
JPWO2012176317A1 (en) * 2011-06-23 2015-02-23 サイバーアイ・エンタテインメント株式会社 Interest graph collection system by relevance search with image recognition system
US9600499B2 (en) 2011-06-23 2017-03-21 Cyber Ai Entertainment Inc. System for collecting interest graph by relevance search incorporating image recognition system
JP2013105364A (en) * 2011-11-15 2013-05-30 Nippon Telegr & Teleph Corp <Ntt> Document feature extraction device, document feature extraction method, and document feature extraction program
WO2015190474A1 (en) * 2014-06-12 2015-12-17 Emotion Intelligence株式会社 Perk management system and perk management method
JP2016001422A (en) * 2014-06-12 2016-01-07 Emotion Intelligence株式会社 Privilege management system and privilege management method
US10198426B2 (en) 2014-07-28 2019-02-05 International Business Machines Corporation Method, system, and computer program product for dividing a term with appropriate granularity
CN106506234A (en) * 2016-12-05 2017-03-15 深圳市彬讯科技有限公司 A kind of SOA services monitor in real time is reported and service performance metrics method
CN106506234B (en) * 2016-12-05 2019-09-10 深圳市彬讯科技有限公司 A kind of SOA service real time monitoring reports and service performance metrics method

Similar Documents

Publication Publication Date Title
Ganu et al. Beyond the stars: improving rating predictions using review text content.
Agarwal et al. Identifying the influential bloggers in a community
Bernstein et al. Eddi: interactive topic-based browsing of social status streams
Grady et al. Crowdsourcing document relevance assessment with mechanical turk
Li et al. Deriving market intelligence from microblogs
KR101532715B1 (en) Search engine that applies feedback from users to improve search results
US8612435B2 (en) Activity based users&#39; interests modeling for determining content relevance
Otterbacher 'Helpfulness' in online communities: a measure of message quality
Canini et al. Finding credible information sources in social networks based on content and social structure
US7685091B2 (en) System and method for online information analysis
JP5731250B2 (en) System and method for recommending interesting content in an information stream
US8321278B2 (en) Targeted advertisements based on user profiles and page profile
JP5281405B2 (en) Selecting high-quality reviews for display
US9535911B2 (en) Processing a content item with regard to an event
Evans Analysing Google rankings through search engine optimization data
White et al. Predicting user interests from contextual information
US20120239497A1 (en) Method and process of using a social network to retarget a personal advertisement
Jansen et al. Determining the informational, navigational, and transactional intent of Web queries
Weber et al. The demographics of web search
KR101639773B1 (en) Identifying inadequate search content
US8494897B1 (en) Inferring profiles of network users and the resources they access
US20100274753A1 (en) Methods for filtering data and filling in missing data using nonlinear inference
Bozzon et al. Choosing the right crowd: expert finding in social networks
US20090287676A1 (en) Search results with word or phrase index
US20140297658A1 (en) User Profile Recommendations Based on Interest Correlation

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20081224

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20101215

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20101221

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20110221

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20110329

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20120312