WO2012046436A1 - Dispositif d'analyse de documents, procédé d'analyse de documents, et programme d'analyse de documents - Google Patents

Dispositif d'analyse de documents, procédé d'analyse de documents, et programme d'analyse de documents Download PDF

Info

Publication number
WO2012046436A1
WO2012046436A1 PCT/JP2011/005587 JP2011005587W WO2012046436A1 WO 2012046436 A1 WO2012046436 A1 WO 2012046436A1 JP 2011005587 W JP2011005587 W JP 2011005587W WO 2012046436 A1 WO2012046436 A1 WO 2012046436A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
vector
document vector
pseudo
unit
Prior art date
Application number
PCT/JP2011/005587
Other languages
English (en)
Japanese (ja)
Inventor
水嶋 靖和
Original Assignee
旭化成株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 旭化成株式会社 filed Critical 旭化成株式会社
Priority to JP2012537582A priority Critical patent/JPWO2012046436A1/ja
Publication of WO2012046436A1 publication Critical patent/WO2012046436A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • G06F16/94Hypermedia

Definitions

  • the present invention relates to a document analysis apparatus, a document analysis method, and a document analysis program for predicting a document that will appear in the future based on an existing document.
  • FIG. 8 is a diagram showing an example of a patent map that represents future trend prediction from parameter value trends. Each point in the map represents an existing patent publication. Each patent is expressed by a combination of the value of parameter A and the value of parameter B constituting each patent, and a point representing the patent gazette is arranged at a position in the map corresponding to the combination.
  • each point in the map is accompanied by the filing year of the patent corresponding to each point.
  • the filing year By displaying the filing year in this way, it is possible to see the filing tendency such as what combination of parameter A and parameter B is realized in order of filing year. Then, it is possible to predict a combination of parameter A and parameter B that can be taken in the future from the relationship between the positions of the points in the map.
  • the number of parameters constituting a patent is not limited to two types as shown in FIG. 8, and is often constituted by three or more types of parameters.
  • the maximum number of parameters is three (three-dimensional display). For example, it is difficult to estimate the development tendency from ten parameters.
  • 2 types of parameters are selected from 10 types, and patents are used using the selected 2 types of parameters. I had to create a map. In that case, all 45 patent maps are created, but it is difficult to grasp the trend of the entire technology and predict future trends by using all 45 patent maps.
  • Patent Document 1 a document stored in a document information storage unit is expressed by n words constituting the document and arranged on a two-dimensional map. A method of predicting future trends is shown by reading the time transition of each document on a two-dimensional map from the time information given to the document and capturing the secular change as a trajectory.
  • Patent Document 1 based on the map generation condition, a document that matches the map generation condition is extracted from the document database. Then, for the extracted document, from the number of occurrences of a predetermined keyword in the document, document coordinates when the document is arranged on the two-dimensional map are obtained, and an image representing the document is arranged on the obtained document coordinates 2 Create a dimension map. Further, in Patent Document 1, for a predetermined keyword, keyword coordinates when the predetermined keyword is arranged on the two-dimensional map are obtained from the number of documents including the keyword, and the predetermined keyword is arranged on the obtained keyword coordinates. Create a two-dimensional map.
  • a forecast period and a forecast model are designated and future forecast is performed. Then, based on the specified prediction period, the existing documents on the two-dimensional map are divided into time series, and the documents that are expected to appear in the future from the group of existing documents divided in time series and the prediction model specified by the user. A new region on the two-dimensional map where is located is estimated.
  • a keyword is arranged in the two-dimensional map, and the keyword included in the estimated new area on the two-dimensional map is regarded as a keyword expressing a document that is expected to appear in the future.
  • the included keywords are extracted as contents expressing future documents.
  • Patent Document 1 an existing document is expressed by a predetermined keyword constituting the existing document.
  • the number of keywords constituting a document is larger than 3
  • Patent Document 1 the user specifies a prediction period and a prediction model, thereby enabling future prediction based on the prediction model, and expressing future documents from keywords included in the predicted region on the two-dimensional map. Keywords can be extracted.
  • Patent Document 1 a two-dimensional map created in an existing document is used to predict the future, and a two-dimensional map obtained in the future is not used. This has the following problems.
  • the two-dimensional map usually changes as new documents are added in the future. That is, if the number of existing documents is much larger than the number of new documents, the two-dimensional map does not change much, but the shape of the two-dimensional map used for prediction is expected to change greatly as the number of new documents increases.
  • Patent Document 1 does not mention a method for creating such a static two-dimensional map. Such a problem can occur similarly even if the number of keywords is three or less.
  • Patent Document 1 when predicting in the future, the user designates a prediction model, and what prediction model is used depends on the judgment of the user. Depending on the prediction model, the obtained future image may be different. In this case, there is a problem that the user needs to have advanced knowledge about the prediction model to be used. These problems also exist in the three-dimensional map. Furthermore, the method of Patent Document 1 has a problem that it is impossible to predict the transition of four-dimensional or more elements with one map.
  • An object of the present invention is to provide a document analysis apparatus, a document analysis method, and a document analysis program capable of predicting the trend of new documents with higher accuracy.
  • the document analysis apparatus, document analysis method, and document analysis program of the present invention can predict the future of elements of four or more dimensions using easily understandable two-dimensional or three-dimensional information.
  • a document information storage unit storing document information including a combination of a plurality of elements, and m (m is an integer of 4 or more) based on the document information.
  • a document vector generation unit that generates a document vector composed of seed elements, and a pseudo document vector generation unit that generates a pseudo document vector that is composed of the m types of elements and is different from the document vector. And calculating a mathematical distance between the document vector and the pseudo document vector, and projecting the document vector and the pseudo document vector onto a two-dimensional or three-dimensional map while maintaining a relative relationship between the vectors.
  • a projection map creation unit that creates a projection map that can be generated, and a vector selection that selects at least one pseudo document vector included in the projection map. Parts and the components as extraction unit vector selection unit extracts the elements constituting the pseudo document vector selecting, Document Analysis device comprising a are provided.
  • the document analysis apparatus creates a map by creating a vector of a document that will appear in the future as a pseudo document vector. That is, in the document analysis apparatus according to one aspect of the present invention, a map is created in consideration of a new document that will appear in the future. Therefore, a two-dimensional map or a three-dimensional map that is created when a new document appears in the future. A static map closer to the map is created. That is, a map that is robust to new future document additions is created.
  • Another aspect of the present invention includes a future area determination unit that determines a future area that is a transition destination of the element on the projection map based on a transition direction of the element constituting the document vector.
  • the vector selection unit may select at least one pseudo document vector included in the future area.
  • the document analysis apparatus may include a new document candidate creation unit that creates a new document candidate based on the elements extracted by the component extraction unit. According to this configuration, a new document candidate that is predicted to appear in the future is created using a static map, so that a document that appears in the future can be predicted more accurately than before.
  • the document analysis apparatus determines whether the invention indicated by the new document candidate is new or not, and the novelty determination unit determines whether the invention is new.
  • a new document output unit that outputs information on the new document candidate indicating the invention that is determined to be a negative document.
  • the element may be a parameter
  • the document vector creation unit may create the document vector using a value obtained by normalizing each of the m types of parameters.
  • the element is a parameter
  • the document vector creation unit uses the value obtained by multiplying a value of at least one parameter among the m parameters by a predetermined value.
  • a document vector may be created. According to this configuration, it is possible to create a map in which the effect of a specific parameter is emphasized.
  • the element is a parameter
  • the document vector creation unit determines that the value of the parameter having the deficiency is missing if any value of the m types of parameters is deficient.
  • the document vector may be created by substituting a temporary value. According to this configuration, it is possible to predict the future even when there is a deficiency in any value of the m kinds of parameters.
  • the element is a parameter
  • the document vector creation unit determines that the value is a range when any of the m types of parameters is specified as a range. At least one document vector corresponding to this value may be created by using at least one value included in the range specified as the specified parameter value. According to this configuration, even if the value of a parameter included in an existing document is specified as a range, it is possible to predict in the future by creating at least one virtual document vector, and a range-specified expression Even robust document analysis becomes possible.
  • the element is a parameter
  • the pseudo document vector creation unit assigns a value determined by a random number to a value of at least one of the m types of parameters.
  • the pseudo document vector may be created.
  • a document information storage unit storing document information including a combination of a plurality of elements, a document vector generation unit, a pseudo vector generation unit, a projection map generation unit, a vector
  • a document analysis method that is executed by a document analysis apparatus that includes a selection unit and a component extraction unit, wherein the document vector generation unit is configured to generate m (m) based on the document information stored in the document information storage unit.
  • the document vector generation unit is configured to generate m (m) based on the document information stored in the document information storage unit.
  • the document analysis method creates a map by creating a vector of a document that will appear in the future as a pseudo document vector. That is, according to the document analysis method according to another aspect of the present invention, a map is created in consideration of a new document that will appear in the future, and therefore a two-dimensional map created when a new document appears in the future. Alternatively, a static map closer to the three-dimensional map is created. That is, a map that is robust to new future document additions is created.
  • m is an integer of 4 or more types based on the document information in a computer having a document information storage unit storing document information composed of a combination of a plurality of elements.
  • a document analysis program causes a computer to execute a process of creating a map by creating a vector of a document that will appear in the future as a pseudo document vector.
  • the computer creates a map in consideration of a new document that will appear in the future, and thus is created when a new document appears in the future.
  • a static map that is closer to a two-dimensional map or a three-dimensional map is created. That is, a map that is robust to new future document additions is created.
  • information of an existing document composed of elements of four or more dimensions is converted into two-dimensional or three-dimensional information that is easy to understand, and appears in the future from the transition of elements included in the existing document Then create a document that you think is possible and make a future prediction. That is, it is possible to predict future trends in new documents with higher accuracy in consideration of future addition of new documents.
  • FIG. 1 is a diagram illustrating a configuration example of a document analysis apparatus according to the present embodiment.
  • the document analysis apparatus 1 includes a document information input unit 100, a document information storage unit 10, a document vector creation unit 101, a pseudo document vector creation unit 102, a projection map creation unit 103, a projection map display unit 104, A future area determination unit 105, a vector selection unit 109, a component extraction unit 110, a new document candidate creation unit 106, a novelty determination unit 107, and a new document output unit 108 are provided.
  • These functions are realized by, for example, a CPU (Central Processing Unit) (not shown) included in the document analysis apparatus 1 executing a program stored in a storage device such as a hard disk or a ROM (Read Only Memory). It is a function.
  • a CPU Central Processing Unit
  • the document information input unit 100 accepts input of document information including a combination of n (n is an integer of 4 or more) types of elements (for example, parameters).
  • n is an integer of 4 or more
  • elements for example, parameters.
  • “document information” is a combination of n types of elements extracted from an existing document.
  • one document information may be extracted from one document, or a plurality of document information may be extracted.
  • n “10” and each document includes a plurality of combinations of 10 types of parameters will be described.
  • a specific example of document information is shown in FIG.
  • one patent gazette (document information) corresponds to one line, and the values of parameters constituting the patents described in each patent gazette are shown for each column.
  • the patents described in each patent publication are expressed by 10 types of parameters from parameter 1 to parameter 10.
  • the values of the ten types of parameters include those expressed by a single value and those expressed by range specification.
  • the values of the parameters may not exist (empty data) in the document information.
  • the document information may be input as text data in which, for example, the contents as shown in FIG. 2 are realized in CSV (Comma Separated Values) format.
  • the input document information is stored in the document information storage unit 10.
  • the document vector creation unit 101 creates a document vector that is a vector having m (m is an integer equal to or less than n) parameters from the document information included in the document information storage unit 10.
  • the document vector creation unit 101 creates a document vector representing each patent publication for each patent publication represented in the document information.
  • m types of parameters are selected from n types of parameters included in the document information.
  • the document vector creation unit 101 creates a document vector using all the parameters included in the document.
  • m types of parameters are selected from n types of parameters included in the document information.
  • m types of parameters may be obtained by processing n types of parameters using a predetermined calculation method.
  • the document vector creation unit 101 may create a document vector using a value obtained by normalizing each of m types of parameters (10 types in the present embodiment).
  • normalization may be performed according to the actual value (value included in the document information), or a value that each parameter can take (when not included in the document information) May also be normalized according to
  • the document vector is created by using the values obtained by normalizing each of the m types of parameters, the bias of values that can be taken by the parameters constituting the document vector is corrected. Will be treated the same, making it possible to make future predictions without bias.
  • the document vector creation unit 101 temporarily scans all patent publication data included in the document information storage unit 10 and obtains the maximum value for each parameter for 10 parameters. Then, for each parameter, the normalized amount of each parameter is obtained so that the maximum value of each parameter is the same when multiplied, and the parameter value multiplied by this normalized amount is used as the value of each parameter.
  • a document vector for the publication will be created.
  • the normalization amount is obtained by scanning all patent publication data included in the document information storage unit 10, but a normalization amount is prepared for each of 10 parameters in advance. It is also possible to use a prepared normalization amount.
  • the normalization amount is applied so that the importance of each parameter is equal, but in order to emphasize the effect of a specific parameter without normalizing each parameter, It is also possible to perform weighting only for specific parameter values. That is, the document vector creation unit 101 may create a document vector using a value obtained by multiplying a value of at least one parameter among m types of parameters by a predetermined value. This makes it possible to create a map that emphasizes the effect of a specific parameter, and facilitates analysis of the specific parameter. In addition, it is also possible to weight only specific parameters after normalizing each parameter.
  • the document vector creation unit 101 creates a document vector by substituting a temporary value for a parameter value having a deficiency when any value of the m types of parameters is deficient. May be. This makes it possible to predict the future by substituting a temporary value even when the parameter value included in the document is missing.
  • the parameter 10 of the patent publication B is empty data (blank).
  • the value of the parameter 10 is uniquely set to “0”. It may be like this.
  • the value of the parameter 10 is an average value in a range of values that the parameter 10 can take, or if there is a value mainly used in general, such a representative value is adopted. May be.
  • each of the selected multiple values is used as the parameter value, and multiple document vectors corresponding to each value are created. You may come to do. This makes it possible to predict the future even when there is a deficiency in the parameter values included in the existing document.
  • the document vector creation unit 101 sets the range designated as the value of the parameter for which the range is designated. By using at least one included value, at least one document vector corresponding to this value may be created.
  • the value of the parameter is expanded into a plurality of parameter values.
  • a plurality of patent publication data is created from the data.
  • the parameter 10 of Patent Gazette A has a range designation notation of “1.0-3.0”.
  • the document vector creation unit 101 uses two points, “1.0” and “3.0” as representative points, and the value of the parameter 10 and the patent publication A with the parameter 10 being “1.0”.
  • Document vectors are created for Patent Publication A with a "3.0".
  • the document vector created by the document vector creation unit 101 is input to the pseudo document vector creation unit 102.
  • the pseudo document vector creation unit 102 creates a pseudo document vector that is a pseudo document vector based on the document vector created by the document vector creation unit 101.
  • the pseudo document vector creation unit 102 creates a pseudo document vector by assigning a value determined by a random number to the value of at least one of the m kinds of parameters. “Assigning a value determined by a random number (to each value of m types of parameters)”, as a non-limiting example, adding or subtracting a value determined by a random number to the value of each of m types of parameters Means, but is not limited to this.
  • the pseudo document vector creation unit 102 sets, for each of 10 parameters constituting the document vector created by the document vector creation unit 101, ⁇ 10% of the range of values that each parameter can take as a maximum value.
  • a pseudo document vector is created by creating two random numbers, and adding the created random number to each parameter value of the document vector.
  • the present invention is not limited to this.
  • a pseudo vector V ′ obtained by adding random numbers ⁇ 1 , ⁇ 2 ,..., ⁇ 10 to each parameter from the document vector V i (v 1 , v 2 ,..., V 10 ).
  • (v 1 ', v 2' , ⁇ , v 10 ') and the random number ⁇ 1, ⁇ 2, ⁇ , pseudo vector V obtained by subtracting beta 10 from the parameters "(v 1", v 2 " , ..., v 10 "). That is, in this embodiment, since two pseudo values are created for one parameter, 1024 ( 2 10 ) pseudo document vectors are created from one document vector.
  • the number of random numbers is set to two for each parameter. However, the present invention is not limited to this, and the number of random numbers may be one or more. The number of random numbers may not be the same for each parameter.
  • the document vector created by the document vector creation unit 101 and the pseudo document vector created by the pseudo document vector creation unit 102 are input to the projection map creation unit 103.
  • the projection map creation unit 103 is a two-dimensional or three-dimensional arrangement in which the document vector and the pseudo document vector are arranged on a two-dimensional plane or in a three-dimensional space based on the distance between the vectors obtained from the document vector and the pseudo document vector. Create a projection map. More specifically, for example, the projection map creation unit 103 calculates the mathematical distance between the document vector and the pseudo document vector, and sets the document vector and the pseudo document vector to 2 while maintaining the relative relationship between the distances between the vectors. A projection map that can be projected onto a three-dimensional or three-dimensional map can be created.
  • the projection map creation unit 103 obtains the distance between the document vectors, between the pseudo document vectors, or between the document vector and the pseudo document vector, and reproduces the magnitude relationship between the obtained vectors.
  • three-dimensional coordinates are obtained by a technique such as a multidimensional scale construction method.
  • the method for converting the distance between the document vector and the pseudo document vector into two-dimensional or three-dimensional coordinates is not particularly limited as long as it is a dimensional compression method.
  • any method other than the multidimensional scale construction method may be used.
  • a two-dimensional map is created by converting into two-dimensional coordinates.
  • the two-dimensional or three-dimensional coordinates created by the projection map creation unit 103 are input to the projection map display unit 104.
  • the projection map display unit 104 displays the projection map created by the projection map creation unit 103. More specifically, the projection map display unit 104 arranges a graphic image representing a document vector or a pseudo document vector at a location corresponding to the two-dimensional or three-dimensional coordinates obtained by the projection map creation unit 103 on the projection map. ,indicate.
  • FIG. 3 is an example of a two-dimensional map on which document vectors and pseudo document vectors created for existing patent publication groups are displayed.
  • a portion highlighted by a large x mark represents a document vector (existing patent publication), and a small circle represents a pseudo document vector (pseudo patent publication).
  • the user can view the positional relationship between the existing patent publication and the pseudo patent publication derived therefrom by browsing this map. It becomes possible to understand visually.
  • the future area determination unit 105 determines a future area that is the transition destination of the predetermined parameter on the projection map based on the transition direction of the predetermined parameter of the document vector.
  • the future area determination unit 105 determines the application from the application year of each patent gazette included in the document information storage unit 10 and the two-dimensional coordinates of each patent gazette obtained by the projection map creation unit 103, for example.
  • the direction on the two-dimensional map representing the transition can be obtained by the least square method as a non-limiting example. Then, the area on the two-dimensional map ahead of the determined direction is specified as the future area, starting from the coordinates of the patent publication with the latest application year.
  • FIG. 4 is a diagram showing an example of a two-dimensional map in which the trend of application transitions obtained from the application year is represented by arrows 30.
  • FIG. 5 is a diagram illustrating an example of a two-dimensional map in which a future region estimated from the patent publication having the newest application year is represented. In FIG. 5, in the area surrounded by two concentric circles 40 and 41 having different radii centered on the patent publication 20 having the newest application year, on the two-dimensional map ahead of the transition direction indicated by the arrow 30.
  • These eight areas 51 to 58 are specified as future areas. Future areas 51 to 58 are areas representing areas where patent publications that are considered to be filed in the future are located. That is, the pseudo patent gazettes included in the future areas 51 to 58 represent patent gazettes that are considered to be filed in the future.
  • the two-dimensional map is divided into regions by concentric circles centered on a specific patent publication (Patent Publication 20).
  • the two-dimensional map is not limited to concentric circles.
  • a method of dividing the region for example, a method of dividing into a grid pattern as shown in FIG. 6 or another method may be used.
  • the future area determination unit 105 receives the user's input. The future area may be determined.
  • the arrow 30 indicating the tendency of the application may be determined as follows. That is, in the two-dimensional map as shown in FIG. 3, when the application years of each patent publication are displayed at the same time in the vicinity of the x mark indicating the existing patent publication, the user can display the two-dimensional map. If it sees, the transition of the existing patent gazette can be grasped intuitively. Therefore, in such a case, the user draws the arrow 30 on the two-dimensional map by an input interface such as a mouse or a touch pen, and the future region determination unit 105 accepts the user's input. The direction may be determined.
  • the vector selection unit 109 selects at least one pseudo document vector included in the projection map. More specifically, for example, the vector selection unit 109 selects at least one pseudo document vector included in the future area selected by the future area determination unit 105.
  • the component extraction unit 110 extracts the types and values of parameters that constitute the pseudo document vector selected by the vector selection unit 109.
  • the new document candidate creation unit 106 creates a new document candidate based on the parameters of the pseudo document vector included in the future area extracted by the component extraction unit 110. Specifically, the new document candidate creation unit 106 determines a range of values that can be taken by 10 parameters of each pseudo document vector included in the future area, for example. Then, document information is created as a new document candidate (that is, a new patent publication) with each parameter value in the range of values that each parameter can take. That is, at least one new document candidate is created from one future area.
  • the new document candidate created by the new document candidate creation unit 106 is input to the novelty determination unit 107.
  • the novelty determination unit 107 determines whether the invention indicated by the new document candidate created by the new document candidate creation unit 106 is novel. “New” means, for example, novelty in the sense that it is not described in the publicly known technology based on the filing date or priority date as stipulated in Article 29, Paragraph 1 of the Japanese Patent Law. It means that it has.
  • the novelty determination unit 107 determines that the new document candidate is the document information storage unit 10 based on the inclusion relationship between each parameter range of the new document candidate and the document information stored in the document information storage unit 10. If it is not included in the document information stored in, it is determined to be new. That is, it is determined whether or not a new document candidate is new from the inclusion relationship of the ten parameters that can be taken by the existing patent publication.
  • the values of the parameters of the new document candidate are displayed on the display or the like, so that the user determines whether or not each new document candidate is new from the value, and an input interface such as a mouse or a keyboard is provided. It may be used to input whether each new document candidate is new or not.
  • the novelty determination unit 107 determines whether the invention indicated by each new document candidate is new or not by accepting the input of the user.
  • the new document output unit 108 outputs information on a new document candidate indicating the invention determined to be new by the novelty determination unit 107.
  • the “information regarding a new document candidate” may output information that can uniquely identify a document candidate indicating an invention determined to be new.
  • the value of each parameter of the document candidate of the invention determined to be novel may be displayed and output on a display or the like, or the identification number of those document candidates may be displayed and output.
  • FIG. 7 is a flowchart showing the flow of processing in the document analysis apparatus according to this embodiment.
  • the document information input unit 100 accepts input of document information, and the input document information is stored in the document information storage unit 10. (Step S101).
  • the document information stored in the document information storage unit 10 is input to the document vector creation unit 101 and also input to the future area determination unit 105.
  • a document vector that is a vector having m types of parameters among the n types of parameters included in the document information received in step S101 is created (step S102).
  • the created document vector is input to the pseudo document vector creation unit 102 and the projection map creation unit 103.
  • the pseudo document vector creation unit 102 creates a pseudo document vector based on the document vector created in step S102 (step S103). Further, in the projection map creation unit 103, on the two-dimensional plane or in the three-dimensional space based on the distance between the vectors obtained from the document vector created in step S102 and the pseudo vector created in step S103. A projection map in which the document vector and the pseudo document vector are arranged is created (step S104). The created projection map is displayed and output on a display or the like in the projection map display unit 104 (step S105).
  • the transition direction of a predetermined parameter of the document vector is determined, and the future area is determined on the map created in step S104 based on the determined transition direction (step S106).
  • the vector selection unit 109 selects at least one pseudo document vector included in the determined future area (step S110).
  • the component extraction unit 110 extracts parameters constituting the pseudo document vector selected by the vector selection unit 109 (step S111).
  • the new document candidate creation unit 106 creates a new document candidate based on the parameters extracted by the component extraction unit 110 (step S107).
  • the novelty determination unit 107 determines whether or not the invention indicated by the new document candidate created in step S107 is novel (step S108).
  • the new document output unit 108 displays and outputs a new document indicating the invention determined to be new (step S109).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention vise à la création d'une carte statique capable de traiter avec précision l'addition de nouveau documents à l'avenir par la création d'une carte bidimensionnelle ou tridimensionnelle à partir d'un document existant tout en prenant en considération les données de documents dont la création est prévue à l'avenir. Selon un mode de réalisation de la présente invention, un vecteur de document comportant m (m représentant un nombre entier égal ou supérieur à 4) nombre de types d'éléments est généré sur la base d'information de document comprenant des combinaisons d'une pluralité d'éléments stockés dans une unité de stockage d'information de documents, et un pseudo-vecteur de document qui est différent du vecteur de document comprenant un nombre m de types d'éléments est généré à partir du vecteur de document. En outre, la distance mathématique entre le vecteur de document généré et le pseudo-vecteur de document est calculée, et une carte de projection dans laquelle le vecteur de document et le pseudo-vecteur de document sont projetés sur une carte bidimensionnelle ou tridimensionnelle est créée tout en retenant la relation relative de la distance entre chaque vecteur. Par ailleurs, au moins un pseudo-vecteur de document contenu dans la carte de projection est sélectionné, et les éléments constituant ledit pseudo-vecteur de documents sont extraits.
PCT/JP2011/005587 2010-10-05 2011-10-03 Dispositif d'analyse de documents, procédé d'analyse de documents, et programme d'analyse de documents WO2012046436A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2012537582A JPWO2012046436A1 (ja) 2010-10-05 2011-10-03 文書分析装置、文書分析方法および文書分析プログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2010225398 2010-10-05
JP2010-225398 2010-10-05

Publications (1)

Publication Number Publication Date
WO2012046436A1 true WO2012046436A1 (fr) 2012-04-12

Family

ID=45927442

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2011/005587 WO2012046436A1 (fr) 2010-10-05 2011-10-03 Dispositif d'analyse de documents, procédé d'analyse de documents, et programme d'analyse de documents

Country Status (2)

Country Link
JP (1) JPWO2012046436A1 (fr)
WO (1) WO2012046436A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101697875B1 (ko) * 2015-10-30 2017-01-18 아주대학교산학협력단 그래프 모델에 기반하는 문서 분석 방법 및 그 시스템
JP2017092977A (ja) * 2011-11-21 2017-05-25 パナソニックIpマネジメント株式会社 画像処理装置および画像処理方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004104859A1 (fr) * 2003-05-22 2004-12-02 Fujitsu Limited Analyseur thematique
JP2004348771A (ja) * 2004-09-13 2004-12-09 Matsushita Electric Ind Co Ltd 技術文書検索装置
JP2005038240A (ja) * 2003-07-16 2005-02-10 Nissan Motor Co Ltd 特許マップ作成システム及び特許マップ作成プログラム
JP2005326897A (ja) * 2003-10-21 2005-11-24 Ipb:Kk 技術・知財評価装置及び技術・知財評価方法
WO2006112507A1 (fr) * 2005-04-20 2006-10-26 Intellectual Property Bank Corp. Dispositif d’extraction de mot d’index dans un document a examiner et analyseur de caracteristique de document
JP2007172429A (ja) * 2005-12-26 2007-07-05 Nomura Research Institute Ltd 文献情報分析装置及び文献情報分析方法
JP2008282222A (ja) * 2007-05-10 2008-11-20 Internatl Business Mach Corp <Ibm> 未来技術動向予測支援装置、方法、プログラム及び未来技術動向予測支援サービスを提供する方法
JP2010205006A (ja) * 2009-03-04 2010-09-16 Nec Corp 未来表現収集システム、未来表現収集方法および未来表現収集用プログラム

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004104859A1 (fr) * 2003-05-22 2004-12-02 Fujitsu Limited Analyseur thematique
JP2005038240A (ja) * 2003-07-16 2005-02-10 Nissan Motor Co Ltd 特許マップ作成システム及び特許マップ作成プログラム
JP2005326897A (ja) * 2003-10-21 2005-11-24 Ipb:Kk 技術・知財評価装置及び技術・知財評価方法
JP2004348771A (ja) * 2004-09-13 2004-12-09 Matsushita Electric Ind Co Ltd 技術文書検索装置
WO2006112507A1 (fr) * 2005-04-20 2006-10-26 Intellectual Property Bank Corp. Dispositif d’extraction de mot d’index dans un document a examiner et analyseur de caracteristique de document
JP2007172429A (ja) * 2005-12-26 2007-07-05 Nomura Research Institute Ltd 文献情報分析装置及び文献情報分析方法
JP2008282222A (ja) * 2007-05-10 2008-11-20 Internatl Business Mach Corp <Ibm> 未来技術動向予測支援装置、方法、プログラム及び未来技術動向予測支援サービスを提供する方法
JP2010205006A (ja) * 2009-03-04 2010-09-16 Nec Corp 未来表現収集システム、未来表現収集方法および未来表現収集用プログラム

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TOSHIYUKI ANDO ET AL.: "Visualization of patent information using text mining tools and the R statistical language", JOURNAL OF INFORMATION PROCESSING AND MANAGEMENT, vol. 52, no. 1, 1 April 2009 (2009-04-01), pages 20 - 31 *
TSUTOMU KIRIYAMA ET AL.: "A small road to some philosophy of the patent analysis", JOURNAL OF INFORMATION PROCESSING AND MANAGEMENT, vol. 52, no. 5, 1 August 2009 (2009-08-01), pages 286 - 299 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017092977A (ja) * 2011-11-21 2017-05-25 パナソニックIpマネジメント株式会社 画像処理装置および画像処理方法
KR101697875B1 (ko) * 2015-10-30 2017-01-18 아주대학교산학협력단 그래프 모델에 기반하는 문서 분석 방법 및 그 시스템

Also Published As

Publication number Publication date
JPWO2012046436A1 (ja) 2014-02-24

Similar Documents

Publication Publication Date Title
Berger et al. Uncertainty‐aware exploration of continuous parameter spaces using multivariate prediction
JP6020161B2 (ja) グラフ作成プログラム、情報処理装置、およびグラフ作成方法
US8345042B2 (en) Mesh-based shape retrieval system
Finnegan et al. Maximum entropy methods for extracting the learned features of deep neural networks
JP5787843B2 (ja) 手書き描画装置、方法及びプログラム
JP5003499B2 (ja) 多目的最適化設計支援装置、方法、及びプログラム
JP4934058B2 (ja) 共クラスタリング装置、共クラスタリング方法、共クラスタリングプログラム、および、そのプログラムを記録した記録媒体
US20150170384A1 (en) Apparatus and method for creating drawing data superimposing grouped data on a screen
JP2019045894A (ja) 検索プログラム、検索方法、及び、検索プログラムが動作する情報処理装置
JP2012523033A (ja) 臓器の区分けのための相互作用的なicpアルゴリズム
JP2020128962A (ja) 材料特性予測装置および材料特性予測方法
JP6647992B2 (ja) 設計支援装置
JP2017146888A (ja) 設計支援装置及び方法及びプログラム
Gadakh et al. Selection of cutting parameters in side milling operation using graph theory and matrix approach
JP5163472B2 (ja) パラメタ空間を分割してモデル化する設計支援装置、方法、及びプログラム
EP3740885A1 (fr) Conception d&#39;ensembles de modèles numériques mettant en oeuvre des informations dérivées de processus de conception antérieurs
JP6330665B2 (ja) 可視化装置、可視化方法および可視化プログラム
JP3978962B2 (ja) 情報検索方法および情報検索装置
WO2012046436A1 (fr) Dispositif d&#39;analyse de documents, procédé d&#39;analyse de documents, et programme d&#39;analyse de documents
JP2011253477A (ja) 設計支援装置及び設計支援方法
JP4397264B2 (ja) 技術文献の市場性分析システム及び市場性分析プログラム
Vergeest et al. Fitting freeform shape patterns to scanned 3D objects
JP2009252185A (ja) 情報検索装置、情報検索方法、制御プログラム及び記録媒体
Barbosa et al. Using performance profiles for the analysis and design of benchmark experiments
JP7270454B2 (ja) 設計支援システム、設計支援方法および設計支援プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11830370

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2012537582

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 11/06/2013)

122 Ep: pct application non-entry in european phase

Ref document number: 11830370

Country of ref document: EP

Kind code of ref document: A1