WO2020208693A1 - 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム - Google Patents
文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム Download PDFInfo
- Publication number
- WO2020208693A1 WO2020208693A1 PCT/JP2019/015368 JP2019015368W WO2020208693A1 WO 2020208693 A1 WO2020208693 A1 WO 2020208693A1 JP 2019015368 W JP2019015368 W JP 2019015368W WO 2020208693 A1 WO2020208693 A1 WO 2020208693A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document information
- information
- input
- unit
- document
- Prior art date
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 205
- 238000004364 calculation method Methods 0.000 claims abstract description 83
- 230000006870 function Effects 0.000 claims description 53
- 238000012905 input function Methods 0.000 claims description 5
- 230000006386 memory function Effects 0.000 claims 1
- 239000000470 constituent Substances 0.000 abstract description 31
- 239000013598 vector Substances 0.000 description 63
- 238000000034 method Methods 0.000 description 35
- 230000008569 process Effects 0.000 description 19
- 238000004891 communication Methods 0.000 description 18
- 230000005484 gravity Effects 0.000 description 17
- 238000012545 processing Methods 0.000 description 13
- 238000010801 machine learning Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000010365 information processing Effects 0.000 description 5
- 238000003825 pressing Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000012937 correction Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000007796 conventional method Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 239000002131 composite material Substances 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 210000003813 thumb Anatomy 0.000 description 2
- 230000003442 weekly effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/418—Document matching, e.g. of document images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/131—Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
- G06Q50/184—Intellectual property management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/414—Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
Definitions
- the present invention relates to a document information evaluation device, a document information evaluation method, and a document information evaluation program.
- Patent Document 1 the weight of each document information is calculated from the frequency of appearance of keywords for each segment (constituent unit) constituting each part of the document information, and the similarity is scored based on a predetermined standard based on the weight. It is stated that. According to the system of Patent Document 1, since the similarity is scored for each segment of the document information, it is possible to reliably search even if the content close to the condition is described only in a part of the document information. ..
- Patent Document 1 based on the system of Patent Document 1, only a part of the document information contains contents close to the conditions, and there may be a case where the document information having a low degree of similarity as a whole is searched.
- literature information that the user does not intend is searched, it becomes necessary to restart the selection of keywords, phrases, sentences, etc. used for the search conditions from the beginning. Then, the selection of the search condition is repeated many times until the document information having the content close to the condition is searched. This takes a lot of time and is very burdensome for the user.
- An object of the present invention is to improve the accuracy of a search in view of the above problems, and to reduce and improve the efficiency of the time required to search for document information having contents close to the conditions. It is to provide an evaluation device, a document information evaluation method, and a document information evaluation support program.
- the document information evaluation device includes an information acquisition unit that acquires input information input from a user terminal that can be operated by the user from the user terminal, and a storage unit that stores a plurality of document information.
- a calculation unit that decomposes the input information into predetermined structural units and calculates the degree of matching with the document information of one of the plurality of document information stored in the storage unit for each decomposed structural unit as a score.
- An output unit that outputs a comparison table showing the degree of difference between the input information and the document information for each component unit based on the score, and an input unit that inputs the self-evaluation of the document information by the user to the comparison table. It is characterized by having.
- the output unit has a high evaluation indicating that the document information is good and a low evaluation indicating that the document information is not good, depending on the result of self-evaluation input by the input unit. It may be characterized by switching to and outputting.
- the output unit may be characterized in that each document information is switched between high evaluation and low evaluation for each structural unit and output.
- the output unit can output the degree of difference between the input information and the plurality of document information in a comparison table for each constituent unit of the input information.
- the output priority of the plurality of document information may be determined by whether or not the score calculated for each constituent unit satisfies a predetermined criterion.
- the document information evaluation device may be characterized in that the input information and the plurality of document information include information related to intellectual property.
- the calculation unit reflects the switching of the self-evaluation mode and the self-evaluation mode indicating self-evaluation, and recalculates the degree of matching of the document information with respect to the input information. May be.
- a document information fixing unit that fixes at least one desired document information desired by the user as main document information among a plurality of document information output to a comparison table. Further, the calculation unit may be characterized in that the matching degree of the document information with respect to the input information is recalculated based on the main document information fixed by the document information fixing unit.
- the computer stores an information acquisition step of acquiring input information input from the user terminal that can be operated by the user and a plurality of document information.
- the storage step and the input information are decomposed into predetermined structural units, and the degree of matching with the document information of one of the plurality of document information stored in the storage step for each decomposed structural unit is calculated as a score.
- the calculation step, the output step that outputs a comparison table showing the degree of difference between the input information and the document information for each component based on the score, and the self-evaluation of the document information by the user for the comparison table. It is characterized by having an input step to be performed.
- the document information evaluation program stores in a computer an information acquisition function for acquiring input information input from a user terminal that can be operated by the user, and a plurality of document information.
- the storage function and the input information are decomposed into predetermined structural units, and the degree of matching with the document information of one of the plurality of document information stored in the storage function for each decomposed structural unit is calculated as a score.
- the calculation function, the output function that outputs a comparison table showing the degree of difference between the input information and the document information for each component based on the score, and the self-evaluation of the document information by the user for the comparison table. It is characterized by executing the input function to be performed.
- the present invention it is possible to improve the accuracy of the search, reduce the time required to search the document information having contents close to the conditions, and realize the efficiency of the search. It is possible to provide an evaluation device, a document information evaluation method, and an intellectual property document information evaluation support program.
- FIG. 1 is a block diagram showing an example of a software configuration of the document information evaluation device 1 according to the embodiment of the present invention.
- the document information evaluation device 1 has information acquisition unit 101, storage unit 102, calculation unit 103, output unit 104, and input unit 105.
- Each of the above-mentioned functional units of the document information evaluation device 1 in the present embodiment will be described as a functional module realized by an information processing program (software) that controls the document information evaluation device 1.
- the document information evaluation program operates on the document information evaluation device 1. That is, the document information evaluation device 1 refers to a device on which the document information evaluation program operates.
- the output unit 104 outputs the result of calculating the degree of matching between the input information and the plurality of document information as a score for the input information related to the content to be searched by the user as a comparison table 100 (see FIG. 3). .. Based on the output result, the user can input the self-evaluation to the comparison table 100. As a result, the document information evaluation device 1 can improve the accuracy of the search, and can reduce the time required for the search when searching for the document information having contents close to the conditions desired by the user. It is possible to improve the efficiency of search.
- the document information evaluation device 1 is a device for connecting to the user terminal 2 via a network (NW) and providing the service of the document information evaluation system to the user terminal 2.
- the document information evaluation device 1 is, for example, a so-called server device or computer (for example, a desktop, a laptop, a tablet, etc.). In one embodiment of the present invention, the document information evaluation device 1 is not limited to these.
- the information acquisition unit 101 acquires the information related to these intellectual properties input from the user terminal 2 from the user terminal 2.
- the user terminal 2 is a terminal device that can be operated by the user, and is, for example, a desktop PC, a notebook PC, a tablet PC, a smartphone, or the like.
- a case where the input information and the document information are information related to intellectual property will be described.
- intellectual property is an idea or a creative work created by human intellectual activity.
- Intellectual property is, for example, an invention, a device, a design, a trademark, a copyrighted work, a circuit arrangement, or a new variety of plant.
- the intellectual property is, for example, a document for explaining the content of the intellectual property, a figure, a table, a graph, a sketch or a photograph (figure, etc.) for explaining the content of the intellectual property, or a figure, etc. It may be a document or the like.
- the information regarding the intellectual property in the present embodiment is the information for extracting the content that the user wants to search as described above.
- Information on intellectual property includes not only information on which rights have been acquired, but also public information before acquisition of rights, undisclosed information, and invention information before filing an application.
- the acquired information is, for example, information for which a patent right, a utility model right, a design right, a trademark right, a copyright, a circuit layout use right, a breeder's right, etc. are established.
- the input information and the document information are information such as sentences (statement of claims, subject of invention, purpose of invention, etc.) or drawings indicating the content of the invention.
- the input information and the document information are information such as a shape, a pattern or a color, or a drawing related to a combination thereof. If the intellectual property is a trademark, the input information and the document information are identification marks of goods or services.
- the storage unit 102 holds the document information.
- Document information includes various data transmitted and received via the NW, and includes, for example, text data and numerical data.
- Textual data includes, for example, information about intellectual property, idea sheets, idea memos, information related to litigation, papers, books (including magazines and weekly magazines), reports and homepages.
- Numerical data includes, for example, experimental data, measurement data, statistical data, and inspection data.
- the text information includes mathematical formula data, chart data, photographic data, and image data (including still images and moving images).
- the information on intellectual property may include information before the acquisition of rights as described above.
- Information before the acquisition of rights is, for example, information that memorizes the process of creating an invention or design, materials or devices prepared for experiments, experimental results, titles of research and development, purpose of research and development, and engineers.
- Ancillary information such as name, engineer's affiliation name, project number, etc.
- Ancillary information may include access authority information to information about the acquired intellectual property.
- the access authority is an authority that can execute processing such as viewing, editing, deleting, and authentication processing for information. For example, an access authority that can execute all processing for an engineer who has stored information on intellectual property.
- the technician who collaborated in the creation of the intellectual property is given access authority to execute the browsing process, or the certifier (described later) who authenticates the information about the intellectual property is executed the authentication process. Grant access rights that can be done.
- the information acquisition unit 101 may acquire such incidental information as information regarding intellectual property.
- the information about these intellectual properties input by the user is acquired from the user terminal 2.
- the intellectual property is an invention
- the intellectual property is not limited to the invention. That is, the creation of intellectual property may include the selection of an identification mark in a trademark.
- the calculation unit 103 calculates the degree of matching with respect to the input information input from the user terminal 2 based on the document information stored in the storage unit 102. Specifically, the input information is decomposed into predetermined structural units, and the degree of matching with the document information of one of the plurality of document information stored in the storage unit 102 is scored for each decomposed structural unit. Can be calculated as.
- the decomposition of the constituent units for example, segmentes the constituent requirements of "information about intellectual property related as an inventor, creator, or applicant" for each punctuation mark. Alternatively, a certain sentence length or a segment may be used for each predicate.
- the output unit 104 acquires similar information and calculates the degree of approximation between the constituent requirements and the similar information. If the keyword of the same or similar information as the keyword of the constituent requirement is a subordinate concept, it may be judged that the degree of matching is high.
- the calculation unit 103 can calculate the presence / absence of a subordinate concept or a superordinate concept by the corpus dictionary of words stored in the storage unit 102 in advance.
- the calculation unit 103 calculates the degree of matching between the constituent requirements and similar information, determines whether or not they match by threshold processing, and calculates the matching points and differences of the constituent requirements.
- the threshold value may be predetermined, or a score calculated by machine learning may be used.
- the calculation unit 103 uses a model that is machine-learned using the information on the intellectual property stored in the storage unit 102 so as to calculate the score of the degree of matching from the user terminal 2. It is possible to calculate the score of the degree of agreement between the input information and the received information about the intellectual property. As a result, the document information evaluation device 1 can calculate the score of the degree of matching based on the information about the intellectual property (for example, past patent information) more quickly, accurately and easily.
- the information about the intellectual property used in machine learning is processed after the information about the intellectual property is quantified in advance for each item, and the input information about the intellectual property input from the user terminal 2 is also processed. After quantifying in the same way, the score for the degree of matching is calculated.
- Items related to information related to intellectual property that are quantified in advance may be, for example, various types of information associated with public publications related to intellectual property.
- Various information related to the public gazette regarding intellectual property includes, for example, the publication date of the public gazette, the submission date of the application documents for the public gazette (that is, the filing date), and the notice of reasons for refusal received in the application for the public gazette. Number of times, content in the notice of reasons for refusal, content of response to the notice of reasons for refusal, number of amendments made in the application relating to the publication, content of the amendment, number of characters in the independent claim, number of claims And so on.
- the calculation unit 103 stores the actual result of the information related to the intellectual property for which the score of the degree of matching was calculated by using the document information evaluation device 1 in the past as feedback. It can be used for machine learning. As a result, the document information evaluation device 1 can extract similar prior art documents with higher accuracy.
- the calculation unit 103 calculates a score for the degree of matching with respect to the input information related to the intellectual property input by the user from the user terminal 2, and if the difference is high (the matching point is low), a new keyword is used. May be extracted and the score may be calculated again for the degree of matching for each constituent requirement. For example, the calculation unit 103 calculates the score again when a new keyword extracted in place of or in addition to the keyword used in the information about the intellectual property input from the user terminal 2 is used. be able to. At this time, the calculation unit 103 can repeat the extraction of the keywords until the score of the degree of matching becomes high. When the score of the degree of matching is calculated to be high, the keyword extracted at this time can be output to the user terminal 2 from the output unit 104.
- the keyword may be extracted at random from the document information about the intellectual property stored in the storage unit 102, or may be extracted from the input information about the intellectual property input from the user terminal 2.
- Osborne checklist or the like may be used to predetermine a method for extracting keywords, and extraction may be performed based on the method.
- the document information evaluation device 1 not only calculates the score of the degree of matching with respect to the information about the intellectual property input from the user terminal 2, but also provides information on how to increase the degree of matching with respect to the information about the intellectual property. Can be presented to the user, and the content desired by the user can be searched accurately.
- Keywords are sentences, phrases, idioms, words, symbols, alphabets, chemical formulas, numbers, etc.
- the storage unit 102 has a high matching point (low difference). Keywords can be extracted from the stored document information on intellectual property, and at this time, keywords extracted in place of or in addition to the keywords used in the input information on intellectual property input from the user terminal 2. It is also possible to extract the keyword located at the end of the distribution in which the score of the degree of matching is high when is used. Among the distributions in which the score of the degree of matching is high when the extracted keyword is used in place of or in addition to the keyword used in the information about intellectual property received from the user terminal 2 when extracting the keyword.
- the document information evaluation device 1 can lower the score of the degree of matching when limiting the information about intellectual property by the extracted keywords. Keywords can be presented. For example, when the information on the intellectual property is an invention for which a patent application is to be filed, the document information evaluation device 1 affirms the possibility of acquiring the right to the invention, and the limitation of the invention is reduced. Keywords that can be extracted can be extracted.
- the calculation unit 103 calculates a new keyword to be added to the information on the intellectual property from the input information on the intellectual property based on the learning data and the information on the possibility of acquiring the right, and the output unit 104 Can output new keywords.
- the document information evaluation device 1 can express information about intellectual property in an easy-to-understand manner by the user, and even a user who has little knowledge of the intellectual property law can easily understand the information about intellectual property.
- the calculation unit 103 may obtain a new similar patent via the output unit 104 for the constituent requirements having a high difference (low coincidence point). .. Specifically, the calculation unit 103 determines that the coincidence point is low, commands the output unit 104 to output new similar information, and the output unit 104 fills in the structural unit having the low coincidence point. You may get new similar information for.
- the calculation of the degree of matching by the calculation unit 103 is executed by the user pressing the search button 109 (see FIG. 3) and inputting the search signal to the calculation unit 103.
- the output unit 104 outputs the evaluation result of the document information to the user terminal 2.
- the evaluation result of the document information is a comparison table in which the degree of difference from the document information similar to the input information (hereinafter referred to as “similar information”) is compared (compared) for each structural unit.
- the evaluation result of the document information is not limited to the comparison table.
- it may be a mock notice of reasons for refusal (a mock notice that resembles a notice of reasons for refusal), or information on intellectual property related to the inventor or applicant.
- the information regarding the intellectual property related as the inventor or the applicant is the invention memo or claim information in which the invention information is described. Similar information is, for example, prior art documents, and comparison table 100 (see FIG. 3) is a so-called claim chart showing technical differences.
- a self-evaluation mode indicating the user's self-evaluation is input to the input unit 105.
- the user can select a self-evaluation mode that indicates the user's self-evaluation.
- the self-evaluation mode is configured to be selectable by the self-evaluation mode changeover switch 115 (see FIG. 3).
- the output unit 104 displays and outputs the self-evaluation mode changeover switch 115 with respect to the comparison table 100. Details will be described later.
- Each of the functional units of the information acquisition unit 101, the storage unit 102, the calculation unit 103, the output unit 104, and the input unit 105 of the document information evaluation device 1 described above has shown an example of the functions of the document information evaluation device 1. This does not limit the functions of the document information evaluation device 1.
- the document information evaluation device 1 does not have to have all the above-mentioned functions, and may have some functions. Further, the document information evaluation device 1 may have a function other than the above.
- the document information evaluation device 1 may have an input function for setting a function and an output function for notifying the operating state of the device by an LED lamp or the like.
- each of the above-mentioned functional units of the document information evaluation device 1 has been described as being realized by software. However, at least one or more of the functional units included in the document information evaluation device 1 may be realized by hardware.
- any of the above-mentioned functional units included in the document information evaluation device 1 may be implemented by dividing one functional unit into a plurality of functional units. Further, any two or more of the above functional units of the document information evaluation device 1 may be integrated into one function. That is, FIG. 1 shows the functions of the document information evaluation device 1 as functional blocks, and does not show, for example, that each functional unit is composed of a separate program file or the like.
- the document information evaluation device 1 may be a device realized by one housing or a system realized by a plurality of devices connected via a network or the like.
- the document information evaluation device 1 may realize a part or all of its functions by a virtual device such as a cloud service provided by a cloud computing system. That is, the document information evaluation device 1 may realize at least one or more of the above-mentioned functional units in another device.
- the document information evaluation device 1 may be a general-purpose computer such as a server device, or may be a dedicated device having limited functions.
- FIG. 2 is a block diagram showing an example of the hardware configuration of the document information evaluation device 1 according to the embodiment of the present invention.
- the document information evaluation device 1 has a CPU (Central Processing Unit) 11, a RAM (Random Access Memory) 12, a ROM (Read Only Memory) 13, a touch panel 14, and a communication I / F (Interface) 15.
- the document information evaluation device 1 is a device that executes the information processing program described with reference to FIG.
- the CPU 11 controls the document information evaluation device 1 by executing an information processing program stored in the RAM 12 or the ROM 13.
- the document information evaluation program is acquired from, for example, a storage medium storing the document information evaluation program, a program distribution server via a network, or the like, installed in the ROM 13, read from the CPU 11, and executed.
- the touch panel 14 has an operation input function and a display function (operation display function).
- the touch panel 14 enables the user of the document information evaluation device 1 to input an operation using a fingertip, a touch pen, or the like.
- the document information evaluation device 1 in the present embodiment uses the touch panel 14 having the operation display function will be described, but the document information evaluation device 1 separates the display device having the display function and the operation input device having the operation input function. It may have.
- the display screen of the touch panel 14 can be performed as the display screen of the display device, and the operation of the touch panel 14 can be performed as the operation of the operation input device.
- the touch panel 14 may be realized in various forms such as a head-mounted type, a glasses type, and a wristwatch type display.
- the communication I / F15 is an I / F for communication.
- the communication I / F15 executes, for example, a wireless LAN, a wired LAN, or a short-range wireless communication such as infrared rays.
- the communication I / F 15 realizes communication with the user terminal 2 via, for example, the NW.
- the communication I / F 15 may realize communication with another document information evaluation device 1. Although only the communication I / F 15 is shown as the communication I / F in FIG. 2, the document information evaluation device 1 may have each communication I / F in a plurality of communication methods.
- FIG. 3 is a schematic view showing an example of an output screen according to an embodiment of the present invention.
- a self-evaluation mode indicating the user's self-evaluation is input to the input unit 105.
- the self-evaluation mode changeover switch 115 indicating the self-evaluation of the document information as similar information with respect to the comparison table 100
- the self-evaluation command signal is input to the input unit 105.
- the output unit 104 displays and outputs the self-evaluation mode changeover switch 115 indicating the self-evaluation of the document information by the user to the comparison table 100.
- the self-evaluation command signal is input to the input unit 105 by clicking the self-evaluation mode changeover switch 115 with the mouse or operating the keyboard.
- the output unit 104 can output a self-evaluation to the comparison table 100 based on the selection by the user.
- the output unit 104 displays and outputs the self-evaluation mode changeover switch 115 to the comparison table 100, and the user selects the self-evaluation mode changeover switch 115, that is, a mouse click or a keyboard operation. I do. Then, the output unit 104 can display and output, for example, a schematicly designed figure (image) as the self-evaluation mode based on the selection operation of the self-evaluation mode changeover switch 115. In this way, the user operates the self-evaluation mode changeover switch 115 to select the self-evaluation mode, so that the user gives an instruction to select the self-evaluation, and the input unit 105 receives the self-evaluation command signal. ..
- the user can select the self-evaluation mode indicating the self-evaluation by clicking the self-evaluation mode changeover switch 115 with the mouse. Then, based on the self-evaluation command signal from the input unit 105, the output unit 104 switches between a high evaluation mode indicating that the document information is good and a low evaluation mode indicating that the document information is not good, depending on the result of the self-evaluation. Can be output.
- the self-evaluation mode changeover switch 115 may be, for example, a button, an icon, or the like, as long as it can be switched and output.
- the high evaluation mode includes, for example, the “Like” function 115a indicating that the user's evaluation is highly evaluated as “Like!” And “Not good!”. It is possible to switch and output the "No! Function 115b, which indicates that the evaluation is low.
- the output unit 104 can output by changing the display mode of the “Like!” Function 115a and the “No!” Function 115b.
- the display mode of the “Like” function 115a is a pose to be taken when expressing a positive event, for example, a mode in which the thumb is pointed upward while holding a hand. It is possible to output the figure of.
- a figure with a smiling expression or a figure with a hurray may be output.
- the display mode of the "Like! Function 115a may be configured to output a figure of " ⁇ ".
- the display mode of the "No! Function 115b is a pose to be taken when expressing a negative event, for example, the thumb is directed downward while holding the hand. It is possible to output a figure of an aspect. In addition, a facial expression with a pessimistic expression or a swaying figure expressing a disappointing mode may be output. In addition, the display mode of the "No! Function 115b may be configured to output a figure of "x".
- the display mode of the "Like! Function 115a and the "No! Function 115b is not limited to the case where the switch shape is changed and displayed.
- the output unit 104 may be configured to display by changing the color of the switch by the "Like! Function 115a and the "No good! Function 115b.
- the above self-evaluation mode (“Like” function 115a or “No good!” Function 115b) can be switched for each document information or for each input information constituent unit (Element). You can also do it. Since the self-evaluation mode can be switched for each component unit in this way, the high score of the component unit and the low score of the component unit can be visually grasped at a glance.
- the output unit 104 may select the aspect of the character 3 based on the new similar information. Specifically, even if it is configured to select which emotional output mode of the character is to be used depending on the content of the self-evaluation mode (“Like” function 115a or “No good!” Function 115b). Good. For example, when the self-evaluation mode is the “Like” function 115a, it may be configured to be displayed and output in the output mode of “joy” or “comfort”, for example. Further, when the self-evaluation mode is the "No! Function 115b, it may be configured to be displayed and output in the output mode of "angry” or “sorrow”, for example. This display output is executed by the output unit 104.
- the output unit 104 can output and control the aspect of the character 3 based on the score result of the degree of matching calculated by the calculation unit 103.
- the document information evaluation device 1 can express the information related to the intellectual property using the character 3. Even users with little knowledge of intellectual property law can understand information about intellectual property in an easy-to-understand manner.
- the output unit 104 can output a plurality of document information as similar patents output in the comparison table 100. Then, the output unit 104 outputs the degree of difference (matching degree) between the input information and the plurality of document information to the comparison table 100 for each constituent unit (Element).
- FIG. 3 shows an example of outputting five document information, the number of document information is not limited.
- the output priority of the five document information is determined by whether or not the score indicating the degree of matching calculated for each constituent unit meets a predetermined criterion.
- the predetermined standard may be configured to output the top five document information from the left column to the right column in descending order of the total value of the scores of the constituent units, for example.
- the predetermined standard may be configured to output the top five document information from the left column to the right column in descending order of the average value of the scores of each constituent unit.
- the predetermined standard is to move the top five literature information from the left column to the right column, in which the average value of the scores of some arbitrary structural units is equal to or higher than the predetermined value among all the structural units divided into a plurality of units. It may be configured to output in descending order.
- the specification of the arbitrary structural unit may be configured so that the user acquires an arbitrary structural unit input from the user terminal 2.
- the information acquisition unit 101 executes the acquisition of an arbitrary configuration unit. Further, any structural unit may be stored in the storage unit 102 in advance.
- the document information having the highest average value of the matching score of each constituent unit of the invention may be selected as the main reference.
- the determination of the coincidence and difference between the invention and the main reference may be made based on whether or not the score of the degree of coincidence of each constituent unit of the invention is equal to or higher than a predetermined value.
- the document information other than the main reference may be configured to select a similar prior art document having a high matching score as a sub-reference for the structural unit having a low matching score in the main reference.
- the comparison table 100 may include the display of the main reference and the sub-reference in the prior art document, and the display of the structural unit related to the main reference and the structural unit related to the sub-reference.
- the score of the degree of matching for each constituent unit of the invention is output to the display screen 200.
- the degree of agreement is, for example, a numerical value (%) indicating how much the feature amount of the extracted invention is included in similar document information (prior art document), and the higher the numerical value, the more the constituent unit. It shows that it is disclosed in the prior art document.
- the degree of matching for each structural unit is compared for each prior art document as document information and output to the comparison table 100. This output is executed by the output unit 104.
- the structural units of the decomposed inventions (input information) are output in the first column when viewed from the left side of the comparison table 100. When the five document information is viewed from the left side, the scores of the degree of agreement with each prior art document are output for each component in the second to sixth columns.
- the output unit 104 can output the degree of matching of the input information and the document information as a score.
- the score result is output to the display screen 200 by the output unit 104 together with the comparison table 100, for example.
- the score of the degree of matching can be expressed in the form of, for example, ⁇ to% (for example, 80%).
- the calculation of the degree of matching by the calculation unit 103 is executed by the user pressing the search button 109 (see FIG. 3) and inputting the search signal to the calculation unit 103.
- the output unit 104 can select the mode of the character 3 based on the new similar information. Specifically, the emotions of the character's throat are determined by the new similar information and the high or low score of the constituent requirements. It may be configured to select whether to use the output mode.
- the output unit 104 can output and control the aspect of the character 3 based on the score result of the degree of matching calculated by the calculation unit 103.
- the document information evaluation device 1 can express the information related to the intellectual property using the character 3. Even users with little knowledge of intellectual property law can understand information about intellectual property in an easy-to-understand manner.
- the output unit 104 can select the mode of the character 3 based on the newly output similar information.
- the document information evaluation device 1 can express the information related to the intellectual property in an easy-to-understand manner by the user, and the Intellectual Property Law Even users with little knowledge can understand information about intellectual property in an easy-to-understand manner.
- the output unit 104 can specifically show the score result of the degree of matching by the calculation unit 103 via the character 3.
- the document information evaluation device 1 can express the score result in an easy-to-understand manner by the user, and is a user who lacks knowledge of the intellectual property law. However, it is possible to make people understand information about whether or not intellectual property is registered in an easy-to-understand manner.
- the calculation unit 103 can recalculate the degree of matching of the similar prior art documents with respect to the input invention information, reflecting the switching of the self-evaluation mode selected by the user.
- the comparison table in which the degree of difference from the document information (similar prior art document) similar to the input information output by the output unit 104 is compared (compared), the "Like" function 115a and "No".
- the function 115b can weight the output and displayed structural units and recalculate the degree of matching.
- FIG. 4 is a flowchart showing an operation example of the document information evaluation device 1 according to the embodiment of the present invention.
- the document information evaluation device 1 determines whether or not the information regarding the intellectual property has been acquired from the user terminal 2 (S11). Whether or not the information on the intellectual property has been acquired can be determined by whether or not the information acquisition unit 101 has acquired the information on the intellectual property input from the user terminal 2. When it is determined that the information on the intellectual property has not been acquired (step S11: NO), the document information evaluation device 1 repeats the process of S11 and waits for the acquisition of the information on the intellectual property.
- the calculation unit 103 has a degree of matching with the input information input from the user terminal 2 based on the document information stored in the storage unit 102.
- the information is calculated (S12). Specifically, the input information is decomposed into predetermined structural units, and the degree of matching with the document information of one of the plurality of document information stored in the storage unit 102 is scored for each decomposed structural unit. Can be calculated as.
- the decomposition of the constituent units for example, segmentes the constituent requirements of "information about intellectual property related as an inventor, creator, or applicant" for each punctuation mark. Alternatively, a certain sentence length or a segment may be used for each predicate.
- the output unit 104 After executing the process of S12, the output unit 104 outputs a comparison table showing the degree of difference between the input information and the document information for each constituent unit based on the calculation result of the score (S13).
- the evaluation result of the document information is a comparison table in which the degree of difference from the document information similar to the input information (hereinafter referred to as “similar information”) is compared (compared) for each structural unit.
- similar information the evaluation result of the document information is not limited to the comparison table 100.
- it may be a mock notice of reasons for refusal (a mock notice that resembles a notice of reasons for refusal), or information on intellectual property related to the inventor or applicant.
- the information regarding the intellectual property related as the inventor or the applicant is the invention memo or claim information in which the invention information is described. Similar information is, for example, prior art documents, and comparison table 100 is a so-called claim chart showing technical differences.
- the match score can be calculated as, for example, a numerical value of "0%" to "100%".
- the input unit 105 After executing the process of S13, the input unit 105 inputs the self-evaluation of the document information by the user into the comparison table (S14). A self-evaluation mode indicating the user's self-evaluation is input to the input unit 105.
- the self-evaluation mode changeover switch 115 When the user presses the self-evaluation mode changeover switch 115 indicating the self-evaluation of the document information as similar information with respect to the comparison table 100, the self-evaluation command signal is input to the input unit 105.
- the output unit 104 displays and outputs the self-evaluation mode changeover switch 115 indicating the self-evaluation of the document information by the user to the comparison table 100.
- the self-evaluation command signal is input to the input unit 105 by clicking the self-evaluation mode changeover switch 115 with the mouse or operating the keyboard. By clicking the self-evaluation mode changeover switch 115, the user can select the self-evaluation mode indicating self-evaluation. Further, the output unit 104 can output a self-evaluation to the comparison table 100 based on the selection by the user. More specifically, the output unit 104 displays and outputs the self-evaluation mode changeover switch 115 to the comparison table 100, and the user selects the self-evaluation mode changeover switch 115, that is, a mouse click or a keyboard operation. I do.
- the output unit 104 can display and output, for example, a schematicly designed figure (image) as the self-evaluation mode based on the selection operation of the self-evaluation mode changeover switch 115.
- the user operates the self-evaluation mode changeover switch 115 to select the self-evaluation mode, so that the user gives an instruction to select the self-evaluation, and the input unit 105 receives the self-evaluation command signal. ..
- the output unit 104 receives the self-evaluation selection instruction.
- the user can select the self-evaluation mode indicating the self-evaluation by clicking the self-evaluation mode changeover switch 115 with the mouse. Then, based on the self-evaluation command signal from the input unit 105, the output unit 104 switches between a high evaluation mode indicating that the document information is good and a low evaluation mode indicating that the document information is not good, depending on the result of the self-evaluation. Can be output.
- the self-evaluation mode changeover switch 115 may be, for example, a button, an icon, or the like, as long as it can be switched and output.
- the score of the degree of matching can be calculated by, for example, the following processing.
- FIG. 5 is a flowchart showing an example of the score calculation process of the document information evaluation device 1 according to the first embodiment of the present invention.
- the calculation unit 103 first creates a kNN graph (S101).
- the kNN graph is created by the following procedure.
- First technical information associated with technical information related to intellectual property acquired by the information acquisition unit 101 via the user terminal 2, input information as prior art information, and similar technology similar to the input information stored in the storage unit 102.
- the vectorization may be performed by a conventional technique such as Word2Vec, Doc2Vec (Paragraf2Vec), LDA (Latent Dirichlet Allocation) or NTSG (Neural ensor Skip Gram).
- Vectorization is performed by the calculation unit 103.
- a kNN graph is created by the above procedure. Although explained as a sentence above, it may be a combination of a plurality of phrases, a phrase, or a word.
- the calculation unit 103 displays similar document information as an output target for the input information as the technical information acquired from the information acquisition unit 101, similar technical information similar to the input information stored in the storage unit 102, and knowledge. It is set in all the sentences included in the property information (S102), the technical wording included in the input information (technical information) acquired from the information acquisition unit 101 is set as a query, and the output unit 104 is the output target in the CLIE. Outputs similar document information as (S103).
- the output may be performed by a conventional technique such as ElasticSearch (registered trademark).
- the output unit 104 sets the sentence with the highest score as the starting point as the output result (S104), adds the starting point to the final output result (S105), and repeats until the final output result becomes n or more (S106). If the number is less than n, the process proceeds to S107, and if the number is n or more, the process proceeds to S110.
- the calculation unit 103 extracts candidates for query conversion rules (S107).
- Candidates for query conversion rules are extracted by the following procedure. First, based on the created kNN graph, a sentence similar to the sentence set as the starting point is extracted. Then, in the sentence set as the starting point and the extracted similar sentence, the words recognized as having high importance are extracted. The degree of importance may be determined by a conventional technique such as the TF-IDF method. Acquires adjacent words in the sentence set as the starting point and the extracted similar sentences with respect to the extracted words. For example, "file” and "processing" when the extracted word is "distributed” and the sentence is "distributed processing in a distributed file system".
- the calculation unit 103 applies a conversion rule having a high score to the query (S108).
- the number of conversion rules may be one or a plurality, and the number may be controlled by the calculation unit 103. Further, the number of new creators calculated by the conversion rule is determined by the calculation unit 103 based on the user's evaluation information on the evaluation result of the technical information (input information) representing the evaluation target acquired by the information acquisition unit 101. It may be controlled.
- the score can be calculated by the following formula.
- the sentence set as the start point is A
- the adjacent word acquired in the sentence set as the start point is w1
- the extracted similar sentence is B
- the adjacent word acquired in the extracted similar sentence is w2
- P (w, X) be the probability of occurrence of the word w in the sentence X.
- Simularity is an index of the semantic closeness of words, and the larger this value is, the more semantically similar the two words are.
- the similarity can be a value calculated by nltk, which is a package of Python, based on the path length of WordNet.
- the calculation unit 103 sets a sentence adjacent to the start point as the next output target (S109), and outputs again using the query newly calculated by S108 (returns to S103).
- the output unit 104 may output the document information or the document information including the sentence that is the starting point as the final output result.
- the recalculation of the match score is performed by the following procedure.
- the calculation unit 103 calculates the conforming document vector and the non-conforming document vector.
- the calculation of the document vector may be performed by, for example, Word2Ves, Doc2Vec (Paragraph2Vec), LDA, NTSG, or the like.
- the conforming document vector is a vector of the document information of the constituent unit to which the “Like” function 115a is output.
- the non-conforming document vector is a vector of the document information of the constituent unit to which the "No! Function 115b is output.
- the calculation unit 103 calculates a document vector (hereinafter, referred to as "input document vector") of input information input from the user terminal 2 that can be operated by the user.
- the calculation of the document vector may be performed by, for example, Word2Ves, Doc2Vec (Paragraph2Vec), LDA, NTSG, or the like.
- the calculation unit 103 calculates the center of gravity in consideration of each weight of the input document vector, the conforming document vector, and the nonconforming document vector.
- the document information evaluation device 1 is configured to be able to recalculate the score of the degree of agreement.
- the center of gravity refers to each weight of the input document vector, the conforming document vector, and the non-conforming document vector before the search is performed. Specifically, when recalculating the score of the second degree of matching, it was calculated at the time of calculating the score of the first degree of matching before calculating the score of the second degree of matching. The center of gravity is calculated in consideration of the weight.
- the calculation unit 103 executes the calculation of the weight and the center of gravity. Each time the score of the degree of match is recalculated, the weights of the input document vector, the combined document vector, and the nonconforming document vector are recalculated, and the center of gravity is recalculated. Along with this, the input document vector and the conforming document vector are recalculated. The document vector and the nonconforming document vector are corrected. This correction process is executed by the calculation unit 103. Specifically, the inner product of the input document vector for which the correction process is executed by the calculation unit 103, the conforming document vector, and the center of gravity vector of the center of gravity calculated in (3) is calculated.
- the calculation unit 103 again executes the correction processing (recalculation) of the weights of the input document vector, the conforming document vector, and the nonconforming document vector based on the inner product calculated in (4).
- the weight adjustment based on the correction process of the document vector can be performed by, for example, SCDV (Space Composite Document Vectors using Soft clustering over sparse representations).
- the document vector may be calculated by Word2Ves, Doc2Vec (Paragraph2Vec), LDA, NTSG, or the like.
- the calculation unit 103 determines the degree of matching of the document information with respect to the input information. Recalculate the score of.
- the output unit 104 can execute the recalculation processing of the matching degree score as many times as necessary until the desired document information is output, and the comparison as the output result is performed each time.
- a user interface capable of displaying Table 100 is provided.
- the comparison table 100 is output as the search result.
- the score result of the degree of matching is output for each constituent unit.
- the output of the recalculation of the degree of matching is executed by pressing the search button 109 each time the recalculation is performed.
- the user interface may be generated so that the search button 109 can be continuously pressed in a short time.
- This recalculation can be repeated as many times as necessary until the document information desired by the user is output.
- the recalculation of the degree of matching is executed by the user pressing the search button 109 (see FIG. 3) and inputting the search signal to the calculation unit 103.
- the user can reduce the time required to search the document information (similar document information) whose contents are close to the conditions, and realize the efficiency of the search. Can be made to.
- the center of gravity is set closer to the conforming document information and away from the non-conforming document information.
- the conforming document vector is weighted.
- the center of gravity is calculated by multiplying each conforming document vector by each weight corresponding to the conforming document vector.
- the weight is calculated (adjusted) so that the center of gravity is set close to the conforming document information and away from the non-conforming document information, and the center of gravity is calculated in consideration of the weight.
- the calculation unit 103 executes the calculation of the weight and the center of gravity.
- the weight adjustment can be calculated from the center of gravity vector of the center of gravity calculated in (1), the conforming document vector, and the parameters determined based on the non-conforming document vector.
- the parameters are calculated based on the following formula.
- the calculation unit 103 executes the calculation of the parameters.
- the weight can be adjusted by, for example, SCDV (Space Composite Document Documents Using Soft clustering over sparse distribution).
- the document vector may be calculated by Word2Ves, Doc2Vec (Paragraph2Vec), LDA, NTSG, or the like.
- the document information fixing unit executes the fixing of the main document information.
- the output unit 104 outputs a figure (image) of the main document information fixing switch 106 indicating that the main document information is fixed to the comparison table 100, and an operation in which the user selects this switch, that is, a mouse. Click with or use the keyboard. Then, the main document information fixing command signal is input to the document information fixing unit (not shown). The user selects the main information fixing mode by operating the main document information fixing switch 106, and is configured to receive a plurality of document information fixing instructions from the user based on the main document information fixing command signal. ..
- the user can select the main information fixing mode by clicking the main document information fixing switch 106 with the mouse. It suffices if the main information fixing mode can be selected, and the main document information fixing switch 106 may be, for example, a button, an icon, or the like.
- the user can freely select the main information fixed mode, and the output unit 104 selects any of the similar document information output to the comparison table 100 as the main document information. You may. Further, the selection of the main document information is not limited to one, and two or more document information may be selected as the main document information.
- the main document information can be fixed by the user inputting the document information in the input box 107.
- the document information that can be input in this case is not limited to one.
- the user can input two or more document information.
- the user inputs the main document information into the input box 107 and presses the specific switch 108 to recalculate the degree of matching.
- the document information to be input in the input box 107 is, for example, a notice of reasons for refusal, a mock notice of reasons for refusal (a mock notice similar to the notice of reasons for refusal), or as an inventor or an applicant. It also includes information about related intellectual property.
- the information about the intellectual property related as the inventor or the applicant is the invention memo or claim information in which the invention information is described. Similar information includes, for example, prior art documents.
- the document information input to the input box 107 is not limited to the preceding patent document number.
- the document information to be input to the input box 107 includes text data and numerical data.
- Textual data includes, for example, information about intellectual property, idea sheets, idea memos, information related to litigation, papers, books (including magazines and weekly magazines), reports and homepages.
- Numerical data includes, for example, experimental data, measurement data, statistical data, and inspection data.
- the document information to be input to the input box 107 includes mathematical data, chart data, photographic data, and image data (including still images and moving images). In that case, the PDF electronic data can be taken into the input box 107, for example, by dropping and dropping the PDF electronic data. It should be noted that the PDF electronic data of the preceding patent document can be captured by dropping and dropping.
- the output unit 104 specifies at least one or more structural units among the plurality of structural units obtained by decomposing the input information, and outputs (extracts) a plurality of document information including the designated structural units. Can be done.
- the configuration unit can be specified by switching to the "Like" function 115a on the self-evaluation mode changeover switch 115. Then, the score of the degree of matching is calculated for a plurality of document information (similar prior art documents) including a specific structural unit, and the document information is displayed and output in the comparison table 100. Thereby, a more similar prior art document can be output to the input information.
- the calculation of the plurality of document information including the specific structural unit is performed by the same method as the recalculation calculation process described above. Then, among the document information output by the output unit 104, the specific document information can be fixed as the main document information by the user pressing the main document information fixing switch 106.
- the recalculation is performed by adjusting the weight of the input document information based on the main document information.
- the recalculation method is the same as the recalculation method based on the self-evaluation mode described above.
- the main document information includes, for example, citation information and reference information cited as a result of past examinations at the patent offices of each country, as well as non-patent documents such as patent documents, books, and magazines that the user himself has searched for in the past. There may be.
- the document information evaluation device 1 may include a determination unit (not shown) for determining the possibility of acquiring a right.
- the determination unit can search for similar prior art documents similar to the recognized invention, and execute, for example, a process of determining the possibility of acquiring a right depending on the presence or absence of the similar invention. Judgment as to whether or not the inventions are similar can be made, for example, by recognizing the meaning (implication) of the recognized invention and whether or not a cited invention having similar implications can be searched.
- the cited invention is a published patent document or a non-patent document. As the patent document, for example, a document such as a patent gazette published by the Japan Patent Office can be used.
- the non-patent document a document published in an academic journal, a newspaper, a website, or the like can be used.
- the patent document or the non-patent document can be stored in a dedicated database (not shown) so that it can be searched by a determination unit (not shown).
- a keyword is extracted from the words included in the recognized invention, and a synonym for the keyword is obtained from a database (not shown) that stores synonyms, synonyms or derivative words (synonyms, etc.).
- the determination unit may calculate the degree of similarity between sentences as the degree of similarity.
- the determination unit may determine that the possibility of acquiring the right is high when the similarity of the calculated sentences is small. On the other hand, the determination unit (not shown) may determine that the possibility of acquiring the right is low when the calculated sentences have a high degree of similarity.
- the judgment unit is, for example, "S rank (very high possibility)", “A rank (high possibility)", “B rank (possible)” according to the high or low possibility of acquiring the right. Judgment by rank may be made, such as "with sex)" and "C rank (less likely)". Further, the determination is not limited to the display from S rank to C rank. The determination may be, for example, displayed from ⁇ to ⁇ in descending order of probability.
- the judgment unit can judge the possibility of acquisition of rights based on the examination results of acquisition of rights that have been examined in the past by the patent offices of each country.
- the examination result of acquisition of rights is the invention related to the application, the cited reference, and the examination result (whether or not it was rejected based on the cited document) in comparison between the two.
- the judgment unit calculates the similarity between the invention of the application and the text of the cited reference, learns the comparison between the calculated similarity and the examination result, and determines the possibility of acquiring the right. You may. By learning the comparison between the calculated similarity and the past examination results, the judgment unit (not shown) can use the judgments made by the JPO in the past as the judgment criteria, and thus the possibility of acquiring the right.
- the determination accuracy can be improved.
- the storage unit 102 may be configured to store the examination result in advance.
- the examination result can be obtained, for example, from the examination information published by the patent offices of each country.
- the determination unit (not shown) may determine the possibility of acquiring the right based on the examination result.
- the output unit 104 controls the information output to the user terminal 2 based on the data related to the intellectual property calculated by the calculation unit 103 or similar information. Specifically, the output unit 104 can output and control the aspect of the character 3 based on the information on the possibility of acquiring the right calculated by the calculation unit 103. For example, when the determination unit (not shown) determines that the above-mentioned information on the possibility of acquiring the right should be rejected, the output unit 104 selects a sad character from the character information stored in the storage unit 102. , An output instruction is issued to the output unit 104 so as to be displayed on the display screen 200.
- the output unit 104 stores the above-mentioned score or the information on the possibility of acquiring the right in the storage unit 102.
- An output instruction may be issued to the output unit 104 so as to select a sad character from the displayed character information and display it.
- the output unit 104 may output the keyword calculated by the calculation unit 103 via the character 3.
- the calculation unit 103 extracts a keyword from the information about the intellectual property stored in the storage unit 102. Specifically, when the judgment unit (not shown) determines that the information on the possibility of acquiring the right should be rejected, what kind of new keyword should be added to reject the information on the possibility of acquiring the right. A new keyword is calculated by the logic of determining that it is not.
- the calculation unit 103 may enumerate innumerable new keywords, but since it is complicated, the calculation unit 103 may calculate the keywords described in the complaints of the publications of the same or similar technology.
- the judgment unit may machine-learn the past examination results and judge the possibility of acquiring the right.
- the examination result is acquired by the information acquisition unit 101.
- the judgment unit performs machine learning (supervised learning) using the input and output as a data set, inputting the invention related to the application and the cited document, and using the examination result as an output.
- the possibility of acquiring rights can be determined by modeling the obtained data set.
- the dataset can be modeled as a different model depending on, for example, the country, applicable law (including law revision), field of invention, and the like.
- the determination unit (not shown) can improve the accuracy of determination regarding the possibility of acquiring the right by using the learning results learned in each modeling.
- the judgment unit (not shown) machine-learns the new examination results acquired by the information acquisition unit 101, so that even if there is a change in the examination tendency at the JPO, that tendency will occur. It is possible to judge the possibility of acquiring rights in response to changes.
- machine learning a learning technique with supervised learning or a learning technique without supervised learning may be used.
- the learning technique of machine learning for example, a neural network (including deep learning), a support vector machine, clustering (for example, a task, a first embodiment, etc.), a Bayesian network, or the like may be used.
- the program for realizing the function constituting the device described in the present embodiment is stored in a computer-readable storage medium, and the program stored in the storage medium is read into the computer system and executed. Therefore, the above-mentioned various processes of the present embodiment may be performed.
- the "computer system” referred to here may include hardware such as an OS and peripheral devices.
- the "computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
- the "computer-readable storage medium” includes a flexible disk, a magneto-optical disk, a ROM, a writable non-volatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, and the like. It refers to the storage device of.
- the "computer-readable storage medium” is a volatile memory (for example, DRAM (Dynamic)) inside a computer system that serves as a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. It also includes those that hold the program for a certain period of time, such as Random Access Memory)). Further, the program may be transmitted from a computer system in which this program is stored in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium.
- DRAM Dynamic
- the "transmission medium” for transmitting a program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
- the above program may be for realizing a part of the above-mentioned functions. Further, it may be a so-called difference file (difference program) that realizes the above-mentioned function in combination with a program already stored in the computer system.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Tourism & Hospitality (AREA)
- Technology Law (AREA)
- Multimedia (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
図4において、文書情報評価装置1は、利用者端末2からの知的財産に関する情報を取得したか否かを判断する(S11)。知的財産に関する情報を取得したか否かは、情報取得部101が利用者端末2から入力された知的財産に関する情報を取得したか否かで判断することができる。知的財産に関する情報を取得していないと判断した場合(ステップS11:NO)、文書情報評価装置1は、S11の処理を繰り返し、知的財産に関する情報の取得を待機する。
算出部103は、情報取得部101が評価対象を表す入力情報を取得すると、まずkNNグラフを作成する(S101)。kNNグラフは、以下の手順で作成される。まず、情報取得部101が利用者端末2を介して取得した知的財産に関する技術情報に付随する技術情報、先行技術情報としての入力情報、記憶部102に記憶された入力情報に類似する類似技術情報に含まれる文章を全てベクトル化する。ベクトル化は、Word2Vec、Doc2Vec(Paragraph2vec)、LDA(Latent Dirichlet Allocation)またはNTSG(Neural ensor Skip Gram)等の従来技術によって行われてよい。ベクトル化は算出部103が実行する。ベクトルから文章間の距離行列を作成する。各文章を頂点として、それぞれの文字データからそれと距離が短いものk個の文章へと辺を張る。以上の手順により、kNNグラフを作成する。上記にて文章として説明したが、複数の文節の組み合わせ、文節、または単語であってもよい。
式中、始点として設定された文章をA、始点として設定された文章において取得された隣接する単語をw1、抽出された類似する文章をB、抽出された類似する文章において取得された隣接する単語をw2とおき、P(w,X)を文章X中での単語wの出現確率とおく。similarityは、単語の意味的な近さの指標であって、この値が大きいほど2つの単語は意味的に似ていることを意味する。similarityは、PythonのパッケージであるnltkがWordNetのパス長に基づいて算出する値とすることができる。
一致度ぐあいのスコアの再算出は、例えば、以下の手順で行う。(1)まず、適合文書情報および非適合文書情報のベクトル化を行う(以下、それぞれ「適合文書ベクトル」および「非適合文書ベクトル」という)。算出部103は、適合文書ベクトルおよび非適合文書ベクトルを算出する。文書ベクトルの算出は、例えば、Word2Ves、Doc2Vec(Paragraph2vec)、LDA、NTSG等で行ってもよい。ここで、適合文書ベクトルとは、「いいね!」機能115aが出力された構成単位の文書情報をベクトル化したものをいう。また、非適合文書ベクトルとは、「ダメだね!」機能115bが出力された構成単位の文書情報をベクトル化したものをいう。
続いて、重心の算出方法について詳細に説明する。重心は、適合文書情報から近く、非適合文書情報から離れた位置に設定する。(1)まず、適合文書ベクトルに重みを掛ける。適合文書ベクトルが複数ある場合は、それぞれの適合文書ベクトルに対し、該適合文書ベクトルに応じた各重みを掛けて重心を算出する。重心は、適合文書情報から近く、非適合文書情報から離れた位置に設定されるように重みの算出(調整)が行われ、該重みを考慮して重心の算出が行われる。この重みおよび重心の算出は、算出部103が実行する。
2 利用者端末
101 情報取得部
102 記憶部
103 算出部
104 出力部
105 入力部
115 自己評価モード切り替えスイッチ
115a 「いいね!」機能
115b 「ダメだね!」機能
106 固定スイッチ
107 入力ボックス
108 特定スイッチ
109 検索イッチ
100 比較表
200 表示画面
NW ネットワーク
11 CPU
12 RAM
13 ROM
14 タッチパネル
15 通信I/F
Claims (9)
- 利用者が操作可能な利用者端末から入力された入力情報を前記利用者端末から取得する情報取得部と、
複数の文書情報を記憶する記憶部と、
前記入力情報を所定の構成単位に分解し、該分解された構成単位ごとに前記記憶部に記憶された複数の文書情報のうちの一の文書情報との一致度ぐあいをスコアとして算出する算出部と、
前記スコアに基づき、前記入力情報と前記文書情報との差異の程度を前記構成単位ごとに示した比較表を出力する出力部と、
前記比較表に対して、前記利用者による前記文書情報の自己評価を入力する入力部とを備える、
文書情報評価装置。 - 前記出力部は、
前記入力部により入力された自己評価の結果に応じて、前記文書情報が良好であることを示す高評価モードと良好でないことを示す低評価モードに切り替えて出力する、
ことを特徴とする請求項1に記載の文書情報評価装置。 - 前記出力部は、
各文書情報を前記構成単位ごとに、前記高評価モードおよび前記低評価モードに切り替えて出力する、
ことを特徴とする請求項2に記載の文書情報評価装置。 - 前記出力部は、
前記入力情報の構成単位ごとに、該入力情報と複数の前記文書情報との差異の程度を前記比較表に出力することが可能であって、
複数の前記文書情報の出力優先度は、前記構成単位ごとに算出される前記スコアが所定の基準を満たすか否かで決定する、
ことを特徴とする請求項1乃至3のいずれか一項に記載の文書情報評価装置。 - 前記入力情報および複数の前記文書情報は知的財産に関する情報を含む、
ことを特徴とする請求項1乃至4のいずれか一項に記載の文書情報評価装置。 - 前記算出部は、
前記自己評価を示す自己評価モードの切り替えを反映し、前記入力情報に対する前記文書情報の一致度ぐあいを再算出する、
ことを特徴とする請求項5に記載の文書情報評価装置。 - 前記比較表に出力された前記複数の文書情報のうち、前記利用者が希望する所望の少なくとも一の文書情報をメイン文書情報として固定する文書情報固定部を、さらに備え、
前記算出部は、
前記文書情報固定部にて固定した前記メイン文書情報に基づき、前記入力情報に対する前記文書情報の一致度ぐあいを再算出する、
ことを特徴とする請求項5または6に記載の文書情報評価装置。 - コンピュータが、
利用者が操作可能な利用者端末から入力された入力情報を前記利用者端末から取得する情報取得ステップと、
複数の文書情報を記憶する記憶ステップと、
前記入力情報を所定の構成単位に分解し、該分解された構成単位ごとに前記記憶ステップに記憶された複数の文書情報のうちの一の文書情報との一致度ぐあいをスコアとして算出する算出ステップと、
前記スコアに基づき、前記入力情報と前記文書情報との差異の程度を前記構成単位ごとに示した比較表を出力する出力ステップと、
前記比較表に対して、前記利用者による前記文書情報の自己評価を入力する入力ステップとを備える、
文書情報評価方法。 - コンピュータに、
利用者が操作可能な利用者端末から入力された入力情報を前記利用者端末から取得する情報取得機能と、
複数の文書情報を記憶する記憶機能と、
前記入力情報を所定の構成単位に分解し、該分解された構成単位ごとに前記記憶機能に記憶された複数の文書情報のうちの一の文書情報との一致度ぐあいをスコアとして算出する算出機能と、
前記スコアに基づき、前記入力情報と前記文書情報との差異の程度を前記構成単位ごとに示した比較表を出力する出力機能と、
前記比較表に対して、前記利用者による前記文書情報の自己評価を入力する入力機能とを実行させる、
文書情報評価プログラム。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019526021A JP6555704B1 (ja) | 2019-04-08 | 2019-04-08 | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム |
CN201980025031.3A CN112352229A (zh) | 2019-04-08 | 2019-04-08 | 文档信息评价装置、文档信息评价方法及文档信息评价程序 |
PCT/JP2019/015368 WO2020208693A1 (ja) | 2019-04-08 | 2019-04-08 | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム |
EP19924277.7A EP3955125A4 (en) | 2019-04-08 | 2019-04-08 | DEVICE, METHOD AND PROGRAM FOR EVALUATION OF DOCUMENT INFORMATION |
US16/963,851 US11023721B2 (en) | 2019-04-08 | 2019-04-08 | Document information evaluating device, document information evaluating method, and document information evaluating program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2019/015368 WO2020208693A1 (ja) | 2019-04-08 | 2019-04-08 | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020208693A1 true WO2020208693A1 (ja) | 2020-10-15 |
Family
ID=67539790
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/015368 WO2020208693A1 (ja) | 2019-04-08 | 2019-04-08 | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US11023721B2 (ja) |
EP (1) | EP3955125A4 (ja) |
JP (1) | JP6555704B1 (ja) |
CN (1) | CN112352229A (ja) |
WO (1) | WO2020208693A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7421740B1 (ja) | 2023-09-12 | 2024-01-25 | Patentfield株式会社 | 分析プログラム、情報処理装置、および分析方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11263646B1 (en) * | 2014-03-31 | 2022-03-01 | Groupon, Inc. | Systems, apparatus, and methods of programmatically determining unique contacts |
JPWO2021009886A1 (ja) * | 2019-07-17 | 2021-01-21 | ||
WO2021152809A1 (ja) * | 2020-01-30 | 2021-08-05 | 株式会社 AI Samurai | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム |
WO2022172369A1 (ja) * | 2021-02-10 | 2022-08-18 | 三菱電機株式会社 | 画面データ作成プログラム、画面データ作成装置および画面データ作成方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001117937A (ja) * | 1999-10-20 | 2001-04-27 | Hitachi Ltd | 文書検索方法および装置 |
JP2007018389A (ja) * | 2005-07-08 | 2007-01-25 | Just Syst Corp | データ検索装置、データ検索方法、データ検索プログラムおよびコンピュータに読み取り可能な記録媒体 |
JP2009294993A (ja) * | 2008-06-06 | 2009-12-17 | Konica Minolta Holdings Inc | 関連文書抽出方法、関連文書抽出システム、及び関連文書抽出プログラム |
JP2015203961A (ja) | 2014-04-14 | 2015-11-16 | 株式会社toor | 文書抽出システム |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0628403A (ja) | 1992-07-09 | 1994-02-04 | Mitsubishi Electric Corp | 文書検索装置 |
JP4038717B2 (ja) * | 2002-09-13 | 2008-01-30 | 富士ゼロックス株式会社 | テキスト文比較装置 |
US20050210042A1 (en) * | 2004-03-22 | 2005-09-22 | Goedken James F | Methods and apparatus to search and analyze prior art |
US7890539B2 (en) * | 2007-10-10 | 2011-02-15 | Raytheon Bbn Technologies Corp. | Semantic matching using predicate-argument structure |
US20100131513A1 (en) * | 2008-10-23 | 2010-05-27 | Lundberg Steven W | Patent mapping |
CN103348380B (zh) * | 2011-02-10 | 2016-08-10 | 日本电气株式会社 | 差异区域检测系统和差异区域检测方法 |
US9230061B2 (en) * | 2011-08-15 | 2016-01-05 | Medcpu, Inc. | System and method for text extraction and contextual decision support |
JP6198866B2 (ja) * | 2016-02-05 | 2017-09-20 | 雲拓科技有限公司 | 特許検索方法 |
US20180018564A1 (en) * | 2016-07-13 | 2018-01-18 | Palantir Technologies Inc. | Artificial intelligence-based prior art document identification system |
WO2018051233A1 (en) * | 2016-09-14 | 2018-03-22 | FileFacets Corp. | Electronic document management using classification taxonomy |
US20180189909A1 (en) * | 2016-12-30 | 2018-07-05 | At&T Intellectual Property I, L.P. | Patentability search and analysis |
JP6704089B2 (ja) * | 2017-04-06 | 2020-06-03 | 株式会社日立製作所 | ライブラリ検索装置、ライブラリ検索システム、及びライブラリ検索方法 |
US11308320B2 (en) * | 2018-12-17 | 2022-04-19 | Cognition IP Technology Inc. | Multi-segment text search using machine learning model for text similarity |
-
2019
- 2019-04-08 WO PCT/JP2019/015368 patent/WO2020208693A1/ja unknown
- 2019-04-08 US US16/963,851 patent/US11023721B2/en active Active
- 2019-04-08 JP JP2019526021A patent/JP6555704B1/ja active Active
- 2019-04-08 EP EP19924277.7A patent/EP3955125A4/en not_active Withdrawn
- 2019-04-08 CN CN201980025031.3A patent/CN112352229A/zh active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001117937A (ja) * | 1999-10-20 | 2001-04-27 | Hitachi Ltd | 文書検索方法および装置 |
JP2007018389A (ja) * | 2005-07-08 | 2007-01-25 | Just Syst Corp | データ検索装置、データ検索方法、データ検索プログラムおよびコンピュータに読み取り可能な記録媒体 |
JP2009294993A (ja) * | 2008-06-06 | 2009-12-17 | Konica Minolta Holdings Inc | 関連文書抽出方法、関連文書抽出システム、及び関連文書抽出プログラム |
JP2015203961A (ja) | 2014-04-14 | 2015-11-16 | 株式会社toor | 文書抽出システム |
Non-Patent Citations (1)
Title |
---|
See also references of EP3955125A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7421740B1 (ja) | 2023-09-12 | 2024-01-25 | Patentfield株式会社 | 分析プログラム、情報処理装置、および分析方法 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2020208693A1 (ja) | 2021-04-30 |
US20210056304A1 (en) | 2021-02-25 |
US11023721B2 (en) | 2021-06-01 |
JP6555704B1 (ja) | 2019-08-07 |
CN112352229A (zh) | 2021-02-09 |
EP3955125A1 (en) | 2022-02-16 |
EP3955125A4 (en) | 2022-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020208693A1 (ja) | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム | |
AU2018205185B2 (en) | Scalable font pairing with asymmetric metric learning | |
US20180285326A1 (en) | Classifying and ranking changes between document versions | |
KR20200094627A (ko) | 텍스트 관련도를 확정하기 위한 방법, 장치, 기기 및 매체 | |
US20160306800A1 (en) | Reply recommendation apparatus and system and method for text construction | |
EP3203383A1 (en) | Text generation system | |
WO2018125585A1 (en) | Graph long short term memory for syntactic relationship discovery | |
US20170076151A1 (en) | Assigning of topical icons to documents to improve file navigation | |
US20130036076A1 (en) | Method for keyword extraction | |
JP6653833B1 (ja) | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム | |
CN111194457A (zh) | 专利评估判定方法、专利评估判定装置以及专利评估判定程序 | |
JP2021086592A (ja) | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム | |
JP7029204B1 (ja) | 技術調査支援装置、技術調査支援方法、および技術調査支援プログラム | |
KR20200053334A (ko) | 융합 연구 촉진을 위한 연구원 맵 구축 방법 및 시스템 | |
Tuarob et al. | Automated discovery of product feature inferences within large-scale implicit social media data | |
RU2719463C1 (ru) | Тематические модели с априорными параметрами тональности на основе распределенных представлений | |
JP2021128620A (ja) | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム | |
JP2020173759A (ja) | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム | |
WO2021152809A1 (ja) | 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム | |
CN113015971B (zh) | 聚类分析方法、聚类分析系统及可读存储介质 | |
JP6916476B2 (ja) | 知的財産支援装置および知的財産支援方法並びに知的財産支援プログラム | |
JP2017208047A (ja) | 情報検索方法、情報検索装置、及びプログラム | |
Panigrahi et al. | A review of recent advances in text mining of Indian languages | |
WO2021245814A1 (ja) | 文書情報評価装置、文書情報評価方法、および文書情報評価プログラム | |
CN112445959A (zh) | 检索方法、检索装置、计算机可读介质及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2019526021 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19924277 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019924277 Country of ref document: EP Effective date: 20211108 |
|
ENP | Entry into the national phase |
Ref document number: 2019924277 Country of ref document: EP Effective date: 20211108 |