WO2011102076A1 - 情報整理システム及び情報整理方法 - Google Patents
情報整理システム及び情報整理方法 Download PDFInfo
- Publication number
- WO2011102076A1 WO2011102076A1 PCT/JP2011/000210 JP2011000210W WO2011102076A1 WO 2011102076 A1 WO2011102076 A1 WO 2011102076A1 JP 2011000210 W JP2011000210 W JP 2011000210W WO 2011102076 A1 WO2011102076 A1 WO 2011102076A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- log data
- extended
- reference information
- data
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
Definitions
- the present invention relates to an information organizing system and information organizing method for analyzing and organizing a large amount of information, and more particularly to an information organizing system and an information organizing method capable of efficiently extracting and displaying information of high importance for a user.
- FIG. 10 is a diagram illustrating an example of a general data format of GPS log data.
- a point group obtained by thinning out corresponding points on the map at appropriate intervals based on data such as geodetic coordinates periodically collected by the GPS logger, Can be displayed as a set of line segments.
- a user can utilize such route information, for example, when recording and creating a travel book etc.
- a technology for displaying a photograph taken by a user on a map in association with the shooting location as background technology.
- a function equivalent to the GPS logger can be given to the camera, and geodetic coordinate information can be given to the photo data. It is also possible to search for a point of GPS log data recorded at the time closest to the photo shooting time using the photo shooting time information and GPS log data, and display the photo as if it was taken at that point. it can.
- Fig. 11 shows an information organization system related to such background technology.
- the information organization system includes a user device 201, a data reading device 202, a server 203, and a user terminal 204.
- GPS log data from a GPS logger, photo data of a digital camera, and the like are uploaded from the user device 201 to the server 203 via the data reader 202.
- the server 203 automatically performs processing such as association with map information, creates a corresponding file (for example, html format), and outputs it to the user terminal 204.
- the user displays and confirms the file 205 created by the server 203 using the user terminal 204.
- the server 203 can also use a third-party API (Application Programming Interface) providing a map utility when realizing such a function.
- a third-party API Application Programming Interface
- the user can automatically blog or the like without performing complicated operations such as analysis of GPS log data, editing of corresponding map information, and mapping and display of corresponding photo data. Can be created.
- GPS log data and photo data results in an enormous amount of data, so it is necessary to select information and items necessary for display.
- the redundant part of the GPS log data is automatically thinned out, all the pictures taken are displayed, or a predetermined rule (for example, a predetermined number is displayed).
- a predetermined rule for example, a predetermined number is displayed.
- the template does not explicitly include the log data, for example, the data that the user is interested in among the nearby sights is automatically displayed, the information value increases.
- a database of sights that can generally become features is prepared, and user preference information is registered in advance. Then, in order to select the relationship with the log data, the relationship between the N log data (log data appropriately thinned out in some cases) and the M landmarks registered in the landmark database is calculated. To do. Then, the calculation for extracting the relationship can be automated by carrying out according to a rule or a criterion determined manually in advance.
- a physical distance from a famous place is extracted using GPS log data, and a candidate within a certain radius is selected as a candidate.
- user preference information for example, category information such as a genre of interest
- Etc. to narrow down those that are judged to be highly relevant.
- an index quantified by some method is obtained by rigorous calculation of N ⁇ M times, sorted, and further, a plurality of types of operations are performed to narrow down based on user preference information. This is possible.
- Patent Document 1 the user's preference is dynamically read based on the log data without complicated operations such as input / change of information related to the preference by the user, and the optimum information at the time of information distribution is obtained.
- a technique for delivering quickly and efficiently is disclosed.
- the CM content recommendation server receives predetermined information from the user terminal via the distribution management server, the user is identified by the user ID, and the log data is accumulated. Information related to the user's preference is detected while referring to the DB, and a distribution schedule is created so that CM content suitable for the user is transmitted to the user terminal side based on the information related to the preference. Then, the CM content is read from the CM content storage DB based on the distribution schedule by the streaming distribution server and distributed to the user terminal.
- the information organizing system has a problem that it is difficult to quickly and automatically display summary information by selecting information that is highly important to the user and is considered characteristic from a large amount of log data. .
- the reason for this is that the superiority or inferiority of each content data is not included in the log data, and it is difficult for the user to determine the superiority or inferiority of importance until the end of log collection. Because it is difficult to enter.
- a group of information that can be candidates for characteristic information is stored in a database in advance and highly relevant information is extracted in consideration of superiority or inferiority information that is different for each user, This is because there are many difficult to display quickly.
- an object of the present invention is to select an information that is highly important for a user from a large amount of log data and select information considered to be characteristic, and to automatically display summary information and information. It is to provide an arrangement method.
- An information organization system includes a reference information database that holds reference information, generalized expression means for mapping metric and nonmetric data to a space so that the distance between them becomes closer to each other, and the reference
- An extended reference information database that holds extended reference information generated by extending and expressing information using the generalized expression means, and an extension generated by expanding and expressing log data using the generalized expression means.
- the relevance of log data, and the strength of the relevance between the extended reference information and the extended log data is measured based on the distance in the mapped space, and the extended reference information that is deeply related to the extended log data is detected.
- Create a predetermined template that summarizes the log data using the detection means and the extended reference information detected by the association detection means Has a template creating means that, a.
- the information organizing method extends log reference data by using a generalized expression means for registering log data and mapping the metric and non-metric data to a space so that the distance between them becomes closer to each other.
- the extended reference information is generated by the expression
- the extended log data is generated by the extended expression of the log data using the generalized expression means, and the relationship between the extended reference information and the extended log data is strengthened.
- a predetermined template that summarizes the log data using the detected extended reference information, and detects extended reference information that is closely related to the extended log data. create.
- a program for causing a computer to execute a process of creating a predetermined template from registered log data maps the metric and non-metric data to a space so that the distance between them becomes closer to each other as they are similar to each other.
- the extended reference information is generated by expanding the reference information using the generalized expression means, the extended log data is generated by expressing the log data using the generalized expression means, and the extended reference information is generated. And the extended log data are measured based on the distance in the mapped space, the extended reference information deeply related to the extended log data is detected, and the detected extended reference information is used.
- a program for causing a computer to execute a process of creating a predetermined template that summarizes the log data.
- an information organization system and an information organization method capable of selecting information considered to be highly important and characteristic from a large amount of log data and displaying summary information quickly and automatically. Can do.
- FIG. 1 is a block diagram showing an information organizing system according to the present embodiment.
- the information organization system includes a reference information database 1 including feature points and feature point information related to the feature points (hereinafter also referred to as reference information), and a generalization.
- the expression means 2 the extended reference information database 3 generalized and expressed by the generalized expression means 2, the probabilistic indexing means 4, and the index table 5 that indexes the extended reference information using the probabilistic indexing means 4 And comprising.
- the log data 6 uploaded by the upload device from the user is expanded log data 7 expressed as a space (typically one point on the vector space) in which the distance is defined using the generalized expression means 2;
- the relevance detecting means 8 probabilistically detects the extended reference information deeply related to the extended log data 7 among the extended reference information registered in the index table 5.
- reference information and extended reference information may be simply expressed as feature points.
- the information organizing system further includes a related feature point set 9 which is a set of extended reference information detected by the relevance detecting unit 8 and scoring for ranking the feature points (extended reference information).
- a related feature point set 9 which is a set of extended reference information detected by the relevance detecting unit 8 and scoring for ranking the feature points (extended reference information).
- an ordering means 10 that ranks the related feature point set 9 described above according to its superiority / inferiority / importance, a feature point list 12 that is ranked by the ordering means 10
- Template creation means 13 for creating a template 14 composed of log data 6 and feature point information deeply related thereto based on the feature point list 12;
- the template 14 is document information represented by html data such as a blog.
- the reference information database 1 is a database of information that is generally useful when a user such as a famous place or a transportation point creates summarized information such as a travel note from log data.
- the information stored in the reference information database 1 includes feature points and feature point information (information including the amount of information as features) that is information related to the feature points.
- Each feature point includes basic information about each feature point, such as its name, geodetic coordinate information, and the type (category) information of that famous place, and detailed information (feature point information) represented by the feature description and user review comments. Are associated with each other.
- the feature point information describes metric information expressed by quantification in a metric vector space such as geodetic coordinates (for example, a three-dimensional space of latitude, longitude, and altitude) and features of the feature points.
- metric information for example, category information.
- the generalized representation means 2 appropriately combines the features of the feature point information such as metric information (eg, geodetic coordinates) and non-metric information (eg, category information). It is quantified to express and expressed as a point in a multidimensional vector space. For example, feature points having non-metric features similar to each other are expressed so as to be arranged at a closer distance in space.
- the generalized vector space is represented by the direct sum of the vector space used for the above-described representation of metric information and the vector space for the metric representation of nonmetric information, and its dimension is (metric The dimension of the vector space) + (the dimension of the vector space obtained by quantifying nonmetric information).
- the present invention is not necessarily limited to this, and feature points having similar features are represented at the same positions in the vector space, and the relationship between the feature points depends on the spatial positional relationship. Any means may be used as long as it is reflected.
- the extended reference information database 3 is a database of detailed information of feature points that have been quantified by the generalized expression means 2 and replaced with generalized expressions for each feature point registered in the reference information database 1. .
- This is a registered generalized expression corresponding to a feature point for ID information (or name) of the feature point.
- the probabilistic indexing means 4 uses the probabilistic neighborhood detecting means designed so that the feature points registered in the extended reference information database 3 are given the same ID table entry ID with a higher probability as they are closer to each other. It is a means to change.
- an approximate neighborhood point search method LSH: Locality Sensitive Hashing
- LSH will be described as a representative example of the probabilistic proximity detection means.
- any method other than LSH may be used as long as the method realizes the same function as LSH.
- LSH is a function and method for associating one point on the vector space with the ID (label) of the entry on the hash table.
- LSH is a method that is designed such that two points that are close to each other are more likely to be hashed to the same entry as the distance is close.
- LSH is applied to the neighborhood detection problem (problem to detect a vector near the Query vector when a certain vector that is a Query is given) and the like. Details of the algorithm include, for example, Mayur Datar, Nicole Immorlica, Piotr Indyk, Vahab S. Mirrokni, , Brooklyn, New York, USA.
- the index table 5 is an information table in which a plurality of feature points are registered for an entry having each entry ID.
- the entry ID is designated as a key, the pointer of the feature point information registered therein and the entity detailed information thereof Is designed to be referenced.
- an entry ID (a set of hash values) is a key, and feature point information registered in the entry can be referred to.
- it may be designed so that points having a short distance in the generalized vector space are registered in the same entry based on strict distance calculation.
- One example is a technique based on Voronoi division.
- the log data 6 is information acquired by a user and uploaded to a server, such as GPS geodetic coordinate information acquired by a GPS logger or photograph data taken by a digital camera.
- the extended log data 7 is data obtained by measuring and generalizing the log data 6 by the generalized expression means 2 in the same manner as the extended reference information database 3 described above.
- preference information such as user-specific information (for example, category information with strong user interest)
- the category information that is of great interest to the user is expanded in addition to the GPS data, and the generalized representation of the log data 6 is It should be placed close to the generalized representation of the feature points that are of interest.
- a method of setting a predetermined initial value is used.
- the relevance detection means 8 extracts the extended feature point information registered in the index table 5 and has a deep relationship with the extended log data 7 and outputs a related feature point set 9. For example, the relevance detection means 8 can extract the closeness of the spatial distance as a highly relevant feature point using LSH. Specifically, the extended log data 7 is input to the LSH, the output entry ID (a set of hash values) is checked, and the feature point information registered in the index table 5 is stored using the entry ID as a key. Extract. As a feature of the index table 5, it is highly possible that items having a short distance in the generalized vector space, that is, highly related items are registered in entries having the same table label. Therefore, it can be said that the feature points registered in the entry having the table label of a certain data point in the extended log data 7 are highly related to each other.
- the relevance detection means 8 is not necessarily limited to the case where LSH is used. Among the feature points registered in the index table 5, the relevance detection means 8 is deeply related to the extended log data 7. Any method that can extract and output the related feature point set 9 may be used.
- the series of feature point detection methods described above does not extract feature points using simple physical proximity between the log data 6 and the geodetic coordinates of the feature points, but considers user preference information.
- the closeness to the feature point including user context information such as user preference information is expressed as the strength of relevance, and the higher the relevance (closer in the generalized vector space), the higher the probability for the user.
- Feature points can be extracted as highly interesting and valuable information.
- the reason why the probabilistic indexing means 4 using the LSH is used is mainly due to the reduction in calculation cost with an emphasis on its high speed, and if there is no problem in the calculation cost, the neighborhood by strict distance calculation is used.
- Other neighboring point detection methods such as calculation and Voronoi division can also be used.
- the scoring policy 11 is defined and provided for each user, and describes information, rules, and the like for giving priority to the importance of the extracted feature points.
- the number of feature points to be extracted can be freely set. However, if the amount is too small, it is difficult to extract feature points that meet the user's taste. In addition, if it is too much, it takes time to calculate and information that is not so important is included, which reduces usefulness. Therefore, in this embodiment, it is desirable to extract a suitable number according to the calculation cost and display it as important information in order from the top by scoring.
- the scoring policy 11 emphasizes prior knowledge about important feature point information (for example, feature points with a high rating of many other users, or feature points belonging to a category in which the user has been strongly interested in the past) Etc.), a rule for rating the feature points to the top is described based on this.
- data collected by the user such as digital camera photo data, is not limited to the vicinity of feature points (not only compared with distances in the generalized vector space, but also in terms of GPS geodetic coordinates and shooting time, for example). If it is in the vicinity), it is possible to describe a rule of rating the feature point to the top as evidence that the user is more interested.
- Such a description example of the scoring policy 11 is an example, and can be arbitrarily described based on the management policy of the administrator.
- the ordering means 10 based on importance ranks the related feature point set 9 using the scoring policy 11 described above, and outputs it as a feature point list 12. If necessary, it is possible to set an upper limit of the number of selections such as selecting the top ten, for example.
- the template creation means 13 creates a template 14 according to a predetermined format based on the feature point list 12.
- the template 14 is, for example, document information described in a markup language represented by xml or html.
- the template 14 includes feature points extracted as a basic structure based on user log data along a temporal transition.
- there is a travel record that describes characteristic sights passing from the start point to the end point and connection information (transportation means, required time, etc.) between the sight points from GPS data input by the user. .
- the user can further edit this template 14.
- feature point information extracted from the log data 6 as having high relevance but not displayed as a result of ranking by the scoring policy 11 and data closely related thereto are further reconstructed using this template 14. By doing so, it is possible to facilitate the user's editing work.
- the reference information (information on feature points such as sights) is indexed in advance by offline processing by the generalized expression means 2 in advance, and the extended reference information is registered in the index table 5. It shall be.
- the client-side user process and the server-side process are as follows.
- the user logs in to the server system as necessary, and uploads log data of various devices typified by a GPS logger to the server using a data reader or the like (step S1).
- the uploaded log data is processed upon user upload, and the generalized representation means 2 is used to obtain the generalized representation of the extended log data 7 (step S2).
- data processing is a series of data processing processing performed in accordance with predetermined rules necessary for the subsequent processing. For example, unnecessary GPS data thinning processing, dimensional compression or dimensionality for matching dimensions, and the like. Processing such as expansion. Such processing is merely an example, and data processing can be arbitrarily determined.
- the extended log data 7 is described in a format in which the entry ID of the corresponding index table 5 can be calculated.
- step S3 the entry ID of the index table 5 to which the extended log data 7 is mapped is calculated through the above-described probabilistic index means (step S3).
- step S4 feature points deeply related to each extended log data 7 are detected from the entry ID obtained in step S3 (step S4).
- the feature point registered in the table entry having the entry ID is the highest relevance.
- feature points with the next highest relevance are extracted, such as searching the neighborhood table as necessary, and a predetermined number of feature points are extracted.
- the number of feature points to be extracted is generally a number determined by a predetermined rule depending on the indexing means, for example, a number determined by setting a lower limit and an upper limit.
- the rule for determining the number of feature points to be extracted is not limited to this, and can be arbitrarily determined.
- the extracted feature point set is ranked according to the scoring policy 11 (importance and priority) corresponding to the user (step S5).
- the scoring policy 11 can be defined based on various rules such as preference information such as user preferences and reputations of other users, in addition to the importance of feature points such as traffic points.
- preference information such as user preferences and reputations of other users, in addition to the importance of feature points such as traffic points.
- the user's preference information can be defined from a behavior history such as past behavior patterns and rating information in addition to the user's profile.
- the user's preference information can be arbitrarily defined.
- a template expressing summary information of log data is created according to a predetermined process based on the ordered feature point set (step S6).
- a predetermined process based on the ordered feature point set (step S6).
- it is GPS data at the time of a user going for a trip, it will come to show, for example in FIG. In FIG. 3, a predetermined number of feature points 21, 22, and 23 closely related to the trajectory are extracted and displayed in time series based on the GPS data of the trip as a template.
- feature point information 31, 32, 33 corresponding to each feature point 21, 22, 23 is displayed.
- the output example shown in FIG. 3 is an example, and a template expressing summary information of log data can be arbitrarily determined.
- feature points other than the feature points used for display and related information can be reconfigured so that the user can easily edit them.
- the template to be displayed is merely a template, and the user can increase or delete the feature points to be displayed based on the template information.
- right-clicking on the connection information may display information closely related to the connection information using a pull-down menu or the like.
- the information closely related to the connection information is, for example, information that belongs to the connection section but is not selected as a result of the ranking in step S5.
- the data is grouped again and associated with each feature point and connection information (hereinafter, feature points and connection information are referred to as display objects) so that information closely related to the connection information can be displayed according to the priority order. Keep it.
- the data can be further edited and additional information can be easily posted.
- the associated data includes, for example, comments and photographs previously associated with the feature point information registered in the reference information database 1. Detailed information such as can be added.
- photo data and comment information from the user's digital camera can be registered in the index table according to the relevance by the same method as described above from the creation time and place, and can be reconstructed by the same method. Is possible.
- the template information 40 including the feature points 41, 42, 43, and 44 and the connection information 45, 46, and 47 between the feature points shown in FIG. 4 is extracted from the feature point information 51 and the general information associated with the feature points. It is automatically generated based on information group 50 such as information 52 and log data 53. In this example, according to the structure of the template information 40, the information group 50 is disassembled and reconfigured according to the relevance between the feature points 41, 42, 43, 44 and the connection information 45, 46, 47.
- connection information 45 includes information 60 including a related feature point sub-set 61, a log data candidate sub-set 62, and a general data candidate sub-set 63, which are associated and reconfigured.
- the feature point 42 is reconstructed in association with information 70 including a log data candidate sub-set 71 and a general data candidate sub-set 72.
- the detailed information of the display object can be automatically displayed as the detailed information of the display object using the detailed information prepared in advance according to a predetermined rule.
- a note 48 such as text information or photographic information describing the feature point 43 in detail is automatically created from general information related to the extracted feature point, and is associated with the feature point 43, and automatically according to the importance level. It is also possible to perform automatic display processing according to a rule such as displaying on the screen.
- the created template information is transferred to the client side through the network (step S7).
- the user displays and confirms the template information transferred from the server using the user terminal (step S8). Further, the user can edit the template information by using the displayed template information and the log data reconstructed by the above method (steps S9 and S10). Then, the editing of the template information by the user is completed, and the template creation work is finished (step S11).
- the information organization system and information organization method according to the present embodiment it is not directly included in the user log data, but is highly relevant to the user log data and highly important for the user.
- related information that has a high possibility of showing high interest can be displayed in association with the summary display of the log data.
- template information is created using the information organization system and the information organization method according to the above-described embodiment.
- a system capable of automatically outputting a travel template by uploading GPS data collected when a user goes on a trip will be described.
- FIG. 5 is a block diagram showing a specific example of the information organization system 80 according to the present embodiment.
- the information organization system 80 includes a user terminal 81, a web server 82, an application server 83, and a database server 84.
- the user has the GPS logger 85 as a user device.
- the user terminal 81 is connected to the web server 82 via a network and can exchange data with each other.
- the user accesses the web server 82 from the user terminal 81, logs in using an account unique to the user through the web page 86, for example, and uploads log data created and recorded during the trip to the web server 82.
- the application server 83 includes a template creation application 83_1, indexing means 83_2, policy information 83_3, and reconfiguration data 83_4.
- the database server 84 includes an index table 84_1, user information 84_2, and a reference information database 84_3.
- the reference information database 84_3 of the database server 84 for example, data on feature points shown in FIG. 6 is registered. Each feature point is expanded to a generalized expression, and the closeness of the spatial distance is extracted as a highly relevant feature point using LSH and stored in the index table 84_1.
- the database server 84 performs such processing as offline processing in advance.
- feature points, geodetic coordinate values, category information, and rating information are registered in the reference information database 84_3.
- This generalized vector space is expressed as a direct sum of a three-dimensional physical geodetic coordinate space and a vector space expressing category information.
- the category information space can generally be expressed in a K-dimensional space for a certain positive integer K.
- K the number of categories
- feature point A is (a1, a2, 1)
- feature point B is (b1, b2, ⁇ 1)
- feature point C is (c1, c2, 1)
- feature point D Becomes (d1, d2, 1), which are respectively located at point 91, point 92, point 93, and point 94 in the three-dimensional vector space shown in FIG.
- a1, a2, b1, b2, c1, c2, d1, and d2 are component values in a two-dimensional space ignoring the height direction of the geodetic coordinates of the feature points A, B, C, and D.
- These feature points are registered in an index table 84_1 stored in the database server 84.
- the index table 84_1 may be stored in the application server 83.
- the rating information of other users can also be designated in advance as real numbers in the range [0, 1] for these feature points.
- the larger the rating information value the higher the evaluation (higher popularity).
- the feature point A is 0.8
- the feature point B is 0.7
- the feature point C is 0.5
- the feature point D is 0.9.
- the GPS log data input by the user is discrete data in which geodetic coordinates from point X (x1, x2) to point Y (y1, y2) are arranged at appropriate time intervals.
- This GPS log data forms a straight locus (trajectory 98) from the point X (x1, x2) to the point Y (y1, y2).
- this information is expressed by a vector component on the generalized vector. This can be determined, for example, from the category information of photo data uploaded in the past (many pictures of mountains, etc.).
- the component value related to the user's interest category is 0.8
- the representation of the locus of the GPS log data on the generalized vector space is a two-dimensional physical geodetic coordinate space as shown in FIG.
- the category information space is expanded to a trajectory in a three-dimensional space, resulting in a trajectory 99 on the surface 95.
- the representation of the user's GPS log data in the generalized vector space (hereinafter referred to as a generalized locus) is on a straight line from (x1, x2, 0.8) to (y1, y2, 0.8).
- a set of points The geometric positions of the start point and end point of the generalized trajectory are represented by points 96 and 97 in FIG. 7, respectively, and the generalized trajectory is represented by a trajectory 99 with respect to the trajectory 98 of the geodetic coordinates. That is, the extended log data obtained by expanding and expressing the trace 98 that is log data using the generalized expression means becomes the trace 99.
- the web server 82 issues a template file creation request to the application server 83.
- an application 83_1 for creating a template from user log data is installed.
- the application server 83 acquires the log data uploaded from the web server 82, the extended log data is created by processing the data as described above in accordance with the user preference information and converting it into a generalized expression.
- the application server 83 inputs the created extended log data into the LSH, examines the entry ID that is output, and uses the entry ID as a key to find a deeply related feature point set from the index table 84_1 in the database server 84.
- the number of the predetermined range is extracted. At this time, if the number of extracted data does not reach the number of the predetermined range, the data is acquired in order from the neighboring table entries, and the data is acquired when the number of the predetermined range is reached. Cancel.
- the number of feature points to be extracted is two.
- the feature point A and the feature point C are extracted because they are adjacent in the generalized expression.
- the feature point B has a distance in the category information axis, it is not detected as the vicinity of the generalized locus. This result reflects that this user is more interested in the mountains than the theme park.
- the feature point D is not extracted because it is determined to be irrelevant to the log data because the physical distance is away from the user's trajectory.
- the feature points are ordered based on the user information 84_2 stored in the database server 84 and the policy information 83_3 stored in the application server 83. Decide which objects to display. Although various types of policy information are determined, it is assumed in this embodiment that rating information of other users is used for simplicity. Then, the feature point A having high rating information is sorted so as to be higher than the feature point C. Thereby, the feature point A is used preferentially over the feature point C.
- the template (travel book) shown in FIG. 8 has a start point X and an end point Y.
- the start point X displays information 101 of the start point X
- the end point Y displays information 103 of the end point Y.
- the extracted feature point A is a feature point that seems to be most relevant to the user in the path from the start point X to the end point Y, and information 102 about the feature point A is automatically displayed.
- connection information 104 of the route XA and the connection information 105 of the route AY are automatically created at the same time, and information such as elapsed time can be displayed.
- the information of the feature point C which is another detected feature point, is reconfigured so as to be associated with the connection information object between the feature point A and the end point Y.
- the connection information AY when the user selects the connection information AY and edits to add new information, the information can be preferentially displayed.
- the data set in the present embodiment is merely a simplified example.
- generalized expressions can add measurement data such as time information in addition to GPS information. It is also possible to express more complex information including category information and other non-metric data, with higher dimensions.
- the generalized trajectory need not be a trajectory in a fixed plane, and it has been expanded to be expressed by a curved surface that depends on the location, etc., and expressed by multiple generalized trajectories that are probabilistically weighted. May be.
- the information organizing system and information organizing method according to the present invention described above, it is possible to automatically create a travel record, an action record, etc. while having excellent responsiveness by uploading log data to a server. It becomes.
- the information organization system and information organization method according to the present invention recommends relevant information such as highly relevant shops and tourist spots from the user's behavior pattern based on the log data, or an advertisement including closely related information. It is also applicable to uses such as displaying
- the information organizing system according to the present embodiment is a generalized expression that maps the reference information database 1 that holds reference information and the space so that the distance between the reference information database 1 and the metric and non-metric data becomes closer to each other as they become similar to each other.
- Means 2 extended reference information database 3 that holds extended reference information generated by extended expression of reference information using generalized expression means, and log data 6 is extendedly expressed using generalized expression means 2 Measure the strength of the relationship between the expanded log data 7 generated and the expanded reference information and the expanded log data based on the distance in the mapped space, and expand the expanded reference information closely related to the expanded log data.
- each component shown in FIG. 9 has already been described with reference to FIG. 1, detailed description thereof will be omitted.
- the program for causing a computer to execute processing for creating a predetermined template from registered log data causes the computer to execute the following steps.
- Generating extended reference information by expanding the reference information using generalized expression means for mapping the metric and non-metric data to a space so that the distance between the metric data and the non-metric data becomes closer to each other.
- a step of generating extended log data by extendedly expressing the log data registered using the generalized expression means. Measuring the strength of the relationship between the extended reference information and the extended log data based on the distance in the mapped space, and detecting the extended reference information having a deep relationship with the extended log data. Creating a predetermined template that summarizes log data using the detected extended reference information;
- Non-transitory computer readable media include various types of tangible storage media.
- Examples of non-transitory computer-readable media include magnetic recording media (eg, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical disks), CD-ROM (Read Only Memory) CD-R, CD -R / W, including semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory)).
- the program may be supplied to the computer by various types of temporary computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves.
- the temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Remote Sensing (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
計量的および非計量的データを互いに類似するほどその距離が近くなるように空間にマッピングする一般化表現手段を用いて、参照情報を拡張表現することで拡張参照情報を生成するステップ。
一般化表現手段を用いて登録されたログデータを拡張表現することで拡張ログデータを生成するステップ。
拡張参照情報と拡張ログデータとの関連性の強さをマッピングされた空間内における距離に基づき測定し、拡張ログデータと関連性の深い拡張参照情報を検出するステップ。
検出された拡張参照情報を用いてログデータを要約した所定のテンプレートを作成するステップ。
2 一般化表現手段
3 拡張参照情報データベース
4 確率的インデックス化手段
5 インデックステーブル
6 ログデータ
7 拡張ログデータ
8 関連性検出手段
9 関連特徴点集合
10 序列化手段
11 スコアリングポリシー
12 特徴点リスト
13 テンプレート作成手段
14 テンプレート
21、22、23 特徴点
24、25 経路
31、32、33 特徴点情報
34、35 接続情報
40 テンプレート情報
41、42、43、44 特徴点
45、46、47 接続情報
48 特徴点のノート
50 情報群
51 抽出された特徴点情報
52 特徴点と関連づけられた一般情報
53 ログデータ
60 接続情報に関連づけられた情報
61 関連特徴点サブ集合
62 ログデータ候補サブ集合
63 一般データ候補サブ集合
70 特徴点に関連づけられた情報
71 ログデータ候補サブ集合
72 一般データ候補サブ集合
80 情報整理システム
81 ユーザ端末
82 ウェブサーバ
83 アプリケーションサーバ
83_1 テンプレート作成アプリケーション
83_2 インデックス化手段
83_3 ポリシー情報
83_4 再構成データ
84 データベースサーバ
84_1 インデックステーブル
84_3 参照情報データベース
84_2 ユーザ情報
85 GPSロガー
86 ウェブページ
Claims (10)
- 参照情報を保持する参照情報データベースと、
計量的および非計量的データを互いに類似するほどその距離が近くなるように空間にマッピングする一般化表現手段と、
前記参照情報を前記一般化表現手段を用いて拡張表現することで生成された拡張参照情報を保持する拡張参照情報データベースと、
ログデータを前記一般化表現手段を用いて拡張表現することで生成された拡張ログデータと、
前記拡張参照情報と前記拡張ログデータとの関連性の強さをマッピングされた空間内における距離に基づき測定し、前記拡張ログデータと関連性の深い拡張参照情報を検出する関連性検出手段と、
前記関連性検出手段により検出された拡張参照情報を用いて前記ログデータを要約した所定のテンプレートを作成するテンプレート作成手段と、
を有する情報整理システム。 - 前記拡張参照情報を互いに近傍にあるほど高い確率で同一インデックステーブルに登録する確率的インデックス化手段を更に備える、請求項1に記載の情報整理システム。
- 前記関連性検出手段は、前記拡張ログデータに基づき求められたエントリIDを用いて前記インデックステーブルに登録されている拡張参照情報を検出する、請求項2に記載の情報整理システム。
- 前記拡張参照情報の次元と前記拡張ログデータの次元とが同一の次元となるように、前記拡張ログデータの次元を拡張する、請求項1乃至3のいずれか一項に記載の情報整理システム。
- 前記関連性検出手段により検出された拡張参照情報を、予め定められたスコアリングポリシーに基づき序列化する序列化手段を更に備える、請求項1乃至4のいずれか一項に記載の情報整理システム。
- 前記拡張参照情報および前記拡張ログデータの少なくとも一つを前記テンプレート作成手段が作成したテンプレートに関連付けて再構成する、請求項1乃至5のいずれか一項に記載の情報整理システム。
- 前記参照情報データベースは、特徴点および当該特徴点に関連する情報である特徴点情報を含み、前記特徴点情報は計量的な情報および非計量的な情報を含む、請求項1乃至6のいずれか一項に記載の情報整理システム。
- 前記ログデータは、ユーザが作成したデータ、ユーザが測定したデータ、及びこれらのデータに関連する場所や時間の情報が付加されたデータの集合である、請求項1乃至7のいずれか一項に記載の情報整理システム。
- ログデータを登録し、
計量的および非計量的データを互いに類似するほどその距離が近くなるように空間にマッピングする一般化表現手段を用いて、参照情報を拡張表現することで拡張参照情報を生成し、
前記一般化表現手段を用いて前記ログデータを拡張表現することで拡張ログデータを生成し、
前記拡張参照情報と前記拡張ログデータとの関連性の強さをマッピングされた空間内における距離に基づき測定し、前記拡張ログデータと関連性の深い拡張参照情報を検出し、
前記検出された拡張参照情報を用いて前記ログデータを要約した所定のテンプレートを作成する、
情報整理方法。 - 計量的および非計量的データを互いに類似するほどその距離が近くなるように空間にマッピングする一般化表現手段を用いて、参照情報を拡張表現することで拡張参照情報を生成し、
前記一般化表現手段を用いて登録されたログデータを拡張表現することで拡張ログデータを生成し、
前記拡張参照情報と前記拡張ログデータとの関連性の強さをマッピングされた空間内における距離に基づき測定し、前記拡張ログデータと関連性の深い拡張参照情報を検出し、
前記検出された拡張参照情報を用いて前記ログデータを要約した所定のテンプレートを作成する処理をコンピュータに実行させる非一時的なコンピュータ可読媒体。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/577,409 US9116916B2 (en) | 2010-02-16 | 2011-01-18 | Information organizing sytem and information organizing method |
JP2012500481A JP5900323B2 (ja) | 2010-02-16 | 2011-01-18 | 情報整理システム及び情報整理方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010031533 | 2010-02-16 | ||
JP2010-031533 | 2010-12-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011102076A1 true WO2011102076A1 (ja) | 2011-08-25 |
Family
ID=44482685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/000210 WO2011102076A1 (ja) | 2010-02-16 | 2011-01-18 | 情報整理システム及び情報整理方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US9116916B2 (ja) |
JP (1) | JP5900323B2 (ja) |
WO (1) | WO2011102076A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017162498A (ja) * | 2017-05-08 | 2017-09-14 | 株式会社ニコン | 画像評価サーバ |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9785883B2 (en) | 2012-04-27 | 2017-10-10 | Excalibur Ip, Llc | Avatars for use with personalized generalized content recommendations |
US8996530B2 (en) * | 2012-04-27 | 2015-03-31 | Yahoo! Inc. | User modeling for personalized generalized content recommendations |
US9836545B2 (en) | 2012-04-27 | 2017-12-05 | Yahoo Holdings, Inc. | Systems and methods for personalized generalized content recommendations |
US9804737B2 (en) | 2014-01-27 | 2017-10-31 | Groupon, Inc. | Learning user interface |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001290727A (ja) * | 2000-04-06 | 2001-10-19 | Nec Corp | 情報提供システムおよび情報提供方法 |
JP2002245061A (ja) * | 2001-02-14 | 2002-08-30 | Seiko Epson Corp | キーワード抽出 |
JP2002278993A (ja) * | 2001-03-16 | 2002-09-27 | Nippon Telegr & Teleph Corp <Ntt> | 画像データ登録・再生方法、システム、プログラムおよびその記録媒体 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003242069A (ja) | 2002-02-20 | 2003-08-29 | Japan Telecom Co Ltd | 情報配信システム及び情報配信方法 |
JP2003288354A (ja) | 2002-03-28 | 2003-10-10 | Seiko Epson Corp | 行動記録の自動作成方法、情報記録媒体、及び行動記録自動作成システム |
KR100724639B1 (ko) * | 2006-06-19 | 2007-06-04 | 삼성전자주식회사 | 위치등록 및 알림 기능이 구비된 디지털 멀티미디어 방송수신기와, 그 등록 및 알림 방법 |
US20080208847A1 (en) * | 2007-02-26 | 2008-08-28 | Fabian Moerchen | Relevance ranking for document retrieval |
US20090048929A1 (en) * | 2007-08-15 | 2009-02-19 | Paul Im | Authenticated travel record |
US20090100063A1 (en) * | 2007-10-10 | 2009-04-16 | Henrik Bengtsson | System and method for obtaining location information using a networked portable electronic device |
US20100179754A1 (en) * | 2009-01-15 | 2010-07-15 | Robert Bosch Gmbh | Location based system utilizing geographical information from documents in natural language |
US20110035329A1 (en) * | 2009-08-07 | 2011-02-10 | Delli Santi James W | Search Methods and Systems Utilizing Social Graphs as Filters |
-
2011
- 2011-01-18 US US13/577,409 patent/US9116916B2/en active Active
- 2011-01-18 JP JP2012500481A patent/JP5900323B2/ja active Active
- 2011-01-18 WO PCT/JP2011/000210 patent/WO2011102076A1/ja active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001290727A (ja) * | 2000-04-06 | 2001-10-19 | Nec Corp | 情報提供システムおよび情報提供方法 |
JP2002245061A (ja) * | 2001-02-14 | 2002-08-30 | Seiko Epson Corp | キーワード抽出 |
JP2002278993A (ja) * | 2001-03-16 | 2002-09-27 | Nippon Telegr & Teleph Corp <Ntt> | 画像データ登録・再生方法、システム、プログラムおよびその記録媒体 |
Non-Patent Citations (2)
Title |
---|
MAYUR DATAR ET AL.: "Locality-Sensitive Hashing Scheme Based on p-Stable Distributions", SCG '04 PROCEEDINGS OF THE TWENTIETH ANNUAL SYMPOSIUM ON COMPUTATIONAL GEOMETRY, ACM, 2004, 2004, pages 253 - 262, Retrieved from the Internet <URL:http://portal.acm.org/ft_gateway.cfm?id=997857&type=pdf> [retrieved on 20110204] * |
TETSUO ISHIBASHI ET AL.: "Approximate Hierarchical Clustering Algorithm Using Locality-Sensitive Hashing, 2003-CVIM-141", IPSJ SIG NOTES, vol. 2003, no. 109, 7 November 2003 (2003-11-07), pages 57 - 62 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017162498A (ja) * | 2017-05-08 | 2017-09-14 | 株式会社ニコン | 画像評価サーバ |
Also Published As
Publication number | Publication date |
---|---|
JP5900323B2 (ja) | 2016-04-06 |
JPWO2011102076A1 (ja) | 2013-06-17 |
US20120310938A1 (en) | 2012-12-06 |
US9116916B2 (en) | 2015-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6759844B2 (ja) | 画像を施設に対して関連付けるシステム、方法、プログラム及び装置 | |
JP6190887B2 (ja) | 画像検索システムおよび情報記録媒体 | |
JP5534007B2 (ja) | 特徴点検出システム、特徴点検出方法、及びプログラム | |
Shen et al. | Automatic tag generation and ranking for sensor-rich outdoor videos | |
JP6440650B2 (ja) | ユーザレビュー提供方法、その装置及びそのコンピュータプログラム | |
CN103914498A (zh) | 一种地图搜索的搜索建议方法和装置 | |
JP5900323B2 (ja) | 情報整理システム及び情報整理方法 | |
KR100706389B1 (ko) | 이미지 상호간의 유사도를 고려한 이미지 검색 방법 및장치 | |
CN105874452B (zh) | 从社交摘要中标记兴趣点 | |
JP4896268B2 (ja) | 情報価値を反映した情報検索方法及びその装置 | |
Spyrou et al. | A survey on Flickr multimedia research challenges | |
JP2015106347A (ja) | レコメンド装置およびレコメンド方法 | |
Trad et al. | Large scale visual-based event matching | |
KR101747532B1 (ko) | 여행성 질의에 대응하는 검색 결과로 코스를 추천하는 방법 및 시스템 | |
Yin et al. | On generating content-oriented geo features for sensor-rich outdoor video search | |
JP2004118290A (ja) | 移動軌跡データ検索用インデックス生成装置及びその方法と、移動軌跡データ検索装置及びその方法と、移動軌跡データ検索用インデックス生成プログラム及びそのプログラムを記録した記録媒体と、移動軌跡データ検索プログラム及びそのプログラムを記録した記録媒体 | |
CN107423294A (zh) | 一种社群图像检索方法及系统 | |
KR101823463B1 (ko) | 연구자 검색 서비스 제공 장치 및 그 방법 | |
Deeksha et al. | A spatial clustering approach for efficient landmark discovery using geo-tagged photos | |
CN103744876A (zh) | 一种用于提供搜索结果的方法与设备 | |
Ardizzone et al. | Extracting touristic information from online image collections | |
TWI524281B (zh) | 地名排序方法及地名排序系統與電腦可讀取記錄媒體 | |
KR101810189B1 (ko) | 사용자 리뷰 제공 방법, 장치 및 컴퓨터 프로그램 | |
JP6167531B2 (ja) | 領域検索方法、領域インデックス構築方法および領域検索装置 | |
JP5670944B2 (ja) | 文書要約装置及び方法及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11744371 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012500481 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13577409 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11744371 Country of ref document: EP Kind code of ref document: A1 |