WO2021245814A1 - Dispositif d'évaluation d'informations de documents, procédé d'évaluation d'informations de documents et programme d'évaluation d'informations de documents - Google Patents

Dispositif d'évaluation d'informations de documents, procédé d'évaluation d'informations de documents et programme d'évaluation d'informations de documents Download PDF

Info

Publication number
WO2021245814A1
WO2021245814A1 PCT/JP2020/021848 JP2020021848W WO2021245814A1 WO 2021245814 A1 WO2021245814 A1 WO 2021245814A1 JP 2020021848 W JP2020021848 W JP 2020021848W WO 2021245814 A1 WO2021245814 A1 WO 2021245814A1
Authority
WO
WIPO (PCT)
Prior art keywords
document information
unit
search
user terminal
information
Prior art date
Application number
PCT/JP2020/021848
Other languages
English (en)
Japanese (ja)
Inventor
崇志 三上
一 白坂
Original Assignee
株式会社 AI Samurai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社 AI Samurai filed Critical 株式会社 AI Samurai
Priority to JP2022529199A priority Critical patent/JPWO2021245814A1/ja
Priority to PCT/JP2020/021848 priority patent/WO2021245814A1/fr
Publication of WO2021245814A1 publication Critical patent/WO2021245814A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Definitions

  • the present invention relates to a document information evaluation device, a document information evaluation method, and a document information evaluation program.
  • Patent Document 1 describes that the weight of each document is calculated from the frequency of appearance of keywords for each segment constituting each part of the document, and the similarity is scored based on the weight based on a predetermined standard. .. According to the system of Patent Document 1, since the similarity is scored for each segment of the document, it is possible to reliably search even if the content close to the condition is described only in a part of the document.
  • Non-Patent Document 1 When searching for a patent document as a search target, a logical combination relationship of search keywords (specifically, logical product (AND) or logical sum (OR), etc.) is specified as a search condition, and the search condition is set.
  • search keywords specifically, logical product (AND) or logical sum (OR), etc.
  • a system for extracting a satisfactory prior patent document is provided (for example, Non-Patent Document 1).
  • the user can improve the search accuracy by inputting all possible synonyms of the search keyword and specifying the logical sum (OR).
  • Patent Document 1 based on the system of Patent Document 1, there may be a case where a document having a low degree of similarity is searched for as a whole, because the content close to the condition is described only in a part of the document. Furthermore, when a document not intended by the user is searched, it becomes necessary to restart the selection of keywords, phrases, sentences, etc. used for the conditions from the beginning. Then, the selection of the search condition is repeated many times until the document having the content close to the condition is searched. This takes a lot of time and is very burdensome for the user.
  • Non-Patent Document 1 it takes labor to think of synonyms for search keywords, and tends to depend on the skill level of search and technical knowledge. Therefore, there is a demand for a document information evaluation device capable of performing a certain level of search or evaluation even by a user who has little experience in searching and technical knowledge.
  • the present invention has been made in view of the above, and provides a document information evaluation device, a document information evaluation method, and a document information evaluation program capable of performing a certain level of search or evaluation regardless of the user's experience.
  • the document information evaluation device decomposes the document information into a predetermined configuration unit and the acquisition unit that acquires the document information input from the user terminal, and includes the document information as a search query in the configuration unit. Based on the search query, the degree of matching between the generation unit that generates the search formula for each configuration unit based on the keyword to be used and the plurality of preceding document information stored in the predetermined storage unit for each configuration unit is calculated as a score. It includes an extraction unit that extracts the preceding document information whose score meets a predetermined criterion, and an output unit that outputs the preceding document information extracted by the extraction unit to the user terminal. The output unit is generated by the generation unit. Display the search formula for each configuration unit. The search formula information is output to the user terminal.
  • the acquisition unit acquires information regarding the operation input related to the change of the search expression in the user terminal, and the extraction unit searches for the search expression changed by the operation input. As, the preceding document information satisfying a predetermined criterion is extracted.
  • the preceding document information is composed of a plurality of items
  • the output unit is a check box in which a plurality of items in the preceding document information for calculating the degree of matching with the document information can be selected.
  • the acquisition unit acquires the item selection information related to the item selection in the preceding document information from the user terminal, and the extraction unit obtains the item selection information and the search. Based on the query, the preceding document information that meets the predetermined criteria is extracted.
  • the output unit outputs the search expression information for displaying the search expression used as the search query in a selectable manner, and the acquisition unit selects the search expression related to the selection of the search expression.
  • the information is acquired from the user terminal, and the extraction unit extracts the preceding document information satisfying a predetermined criterion by using the selected search expression as a search query based on the search expression selection information.
  • the output unit outputs a comparison table showing the degree of difference between the document information and the preceding document information extracted by the extraction unit for each component unit based on the score. ..
  • the document information and the plurality of preceding document information include information related to intellectual property.
  • the document information evaluation method in one embodiment of the present invention includes a step of acquiring document information input from a user terminal to a computer from the user terminal, and decomposing the document information into predetermined constituent units, and the configuration unit is used as a search query. Based on the step of generating a search formula for each structural unit based on the keywords included in, and based on the search query, the degree of matching with a plurality of preceding document information stored in a predetermined storage unit is calculated as a score for each structural unit. , The step of extracting the preceding document information whose score meets a predetermined criterion and the step of outputting the preceding document information extracted by the extracting step to the user terminal are executed, and the step of outputting is determined by the step of generating. The search expression information that displays the generated search expression for each configuration unit is output to the user terminal.
  • the document information evaluation program is configured as a search query by decomposing the document information into a predetermined configuration unit and an acquisition function of acquiring the document information input from the user terminal to the computer.
  • the score is the degree of matching between the generation function that generates a search formula for each configuration unit based on the keywords included in the unit and the multiple preceding document information stored in a predetermined storage unit for each configuration unit based on the search query. It realizes an extraction function that calculates and extracts the preceding document information whose score meets a predetermined standard, and an output function that outputs the preceding document information extracted by the extraction function to the user terminal, and the output function is a generation function.
  • the search expression information that displays the search expression generated by is displayed for each configuration unit is output to the user terminal.
  • a document information evaluation device or the like capable of performing a certain level of search or evaluation regardless of the user's experience.
  • This is an example of the hardware configuration of the document information evaluation device (computer) according to the embodiment of the present invention. It is a figure which shows an example of the display screen of the user terminal which concerns on one Embodiment of this invention. It is a figure which shows an example of the display screen of the user terminal which concerns on one Embodiment of this invention. It is a figure which shows an example of the display screen of the user terminal which concerns on one Embodiment of this invention. It is a flowchart which shows the operation example of the document information evaluation apparatus which concerns on one Embodiment of this invention. It is a flowchart which shows the operation example of the document information evaluation apparatus which concerns on one Embodiment of this invention.
  • FIG. 1 is a schematic diagram of a document information evaluation system 500 according to an embodiment of the present invention.
  • the document information evaluation system 500 includes a document information evaluation device 100, a user terminal 200, and a preceding document information database (DB) 400 connected to each other via a network 300.
  • the number of user terminals 200 is not limited to those shown in the figure, and may include a plurality of user terminals.
  • the document information evaluation system 500 compares the document information with a plurality of preceding document information, extracts the preceding document information similar to the document information, and the degree of similarity (matching degree) between the document information and the extracted preceding document information. It is a system that can evaluate.
  • the document information is some information about the document, and may be the text itself included in the document, or may be a path name, a URL, or the like indicating the storage location of the document.
  • the document information may include, for example, text data (idea sheets, idea memos, information related to litigation, papers, books (including magazines and weekly magazines), reports and homepages), and numerical data (experimental data, Measurement data, statistical data, inspection data) may be included.
  • the document information may include mathematical data, chart data, photographic data, and image data (including still images and moving images). In this embodiment, as an example, a case where the document information and the preceding document information described later are information related to intellectual property will be described.
  • intellectual property is an idea or creation created by human intellectual activity.
  • Intellectual property is, for example, an invention, a device, a design, a trademark, a copyrighted work, a circuit arrangement, or a new variety of plant.
  • a document for explaining the contents of intellectual property for example, a document for explaining the contents of intellectual property, a figure, a table, a graph, a sketch or a photograph (figure, etc.) for explaining the contents of intellectual property, or a figure, etc. are explained. It may be a document or the like.
  • the information regarding the intellectual property in the present embodiment is the information for extracting the content that the user wants to search or analyze as described above.
  • Information on intellectual property includes not only information on which rights have been acquired, but also public information before acquisition of rights, undisclosed information, and invention information before filing an application.
  • the acquired information is, for example, information for which a patent right, a utility model right, a design right, a trademark right, a copyright, a circuit layout usage right, a breeder's right, etc. are established.
  • the document information and the preceding document information are information such as sentences (statement of claims, subject of invention, purpose of invention, etc.) or drawings showing the content of the invention.
  • the document information and the preceding document information are information such as a shape, a pattern or a color, or a drawing related to a combination thereof. If the intellectual property is a trademark, the document information and the preceding document information are identification marks of goods or services.
  • the information on intellectual property may include information before the acquisition of rights as described above.
  • Information before the acquisition of rights is, for example, information that memorizes the process of creating an invention or design, materials or devices prepared for an experiment, experimental results, titles of research and development, purpose of research and development, and engineers.
  • Ancillary information such as name, engineer's affiliation name, project number, etc.
  • Ancillary information may include information on access rights to information about acquired intellectual property.
  • the access authority is an authority that can execute processing such as viewing, editing, deleting, and authentication processing for information. For example, an access authority that can execute all processing for an engineer who has stored information on intellectual property.
  • the technician who collaborated in the creation of the intellectual property is given access authority to execute the browsing process, or the certifier who authenticates the information about the intellectual property is given the access to execute the authentication process. Grant authority.
  • the intellectual property is an invention
  • the intellectual property is not limited to the invention. That is, the creation of intellectual property may include the selection of identification marks in the trademark.
  • the preceding document information database 400 is information on the above-mentioned intellectual property and stores existing information.
  • the prior document information database 400 can be, for example, a database of the Japan Patent Office.
  • the JPO database may include one or more offices.
  • the database is not limited to the above, and may be information existing on the Internet.
  • the database is not limited to the JPO database, and may be a separate database for storing existing information among document information.
  • the database is not limited to the one provided on the network as shown in the figure, and may be stored in the document information evaluation device 100.
  • the user terminal 200 is a communication terminal of a user who uses the document information evaluation service provided by the document information evaluation system 500.
  • the user terminal 200 shows a notebook computer, but the user terminal 200 may be of any type as long as the document information evaluation service can be used via the network 300.
  • the user terminal 200 is, for example, a desktop personal computer, a smartphone, a mobile phone (feature phone), a handheld computer device (for example, PDA (Personal Digital Assistant), etc.), a wearable terminal (for example, a glasses-type device, a clock-type device, a head-mounted display). (HMD: Head-Mounted Display, etc.), other types of computers, or communication platforms may be included.
  • the user terminal 200 receives an input operation from the user and evaluates the document information via the network 300. Send to 100.
  • the network 300 may include a wireless network or a wired network.
  • the network 300 includes wireless LAN (Wireless LAN: WLAN), wide area network (WAN), ISDNs (integrated service digital networks), wireless LANs, LTE (long term evolution), LTE-Advanced, and the like. 4th generation (4G), 5th generation (5G), CDMA (code division multiple access) and the like.
  • the network 300 is not limited to these examples, and the network 300 is not limited to these examples, for example, a public switched telephone network (PSTN), Bluetooth (Bluetooth (registered trademark)), an optical line, an ADSL (Asymmetric Digital Subscriber LINE) line, and a satellite. It may be a communication network or the like. Further, the network 300 may be a combination of these.
  • connection between the user terminal 200 and the document information evaluation device 100 and the connection between the preceding text information database 400 and the document information evaluation device 100 are the communication environment of the network 300 when the information to be handled is confidential information. Must be superior in terms of security. In this case, a dedicated line may be prepared for the connection between them to enhance the curity.
  • the document information evaluation device 100 includes a processor 101, a memory 102, a storage 103, an input / output interface (I / F) 104, and a communication I / F 105, and is described in the present embodiment by their cooperation. Realize functions and methods. For example, the function or method of the present disclosure is realized by the processor 101 executing an instruction included in a program read in the memory 102.
  • the processor 101 executes a function and / or a method realized by a code or an instruction contained in a program stored in the storage 103.
  • the processor 101 is, for example, a central processing unit (CPU), MPU (MicroProcessingUnit), GPU (GraphicsProcessingUnit), microprocessor (microprocessor), processor core (processorcore), multiprocessor (multiprocessor), ASIC (Application-). Specific Integrated Circuit), FPGA (Field Programmable Gate Array), etc. are included, and each is implemented by a logic circuit (hardware) or a dedicated circuit formed in an integrated circuit (IC (Integrated Circuit) chip, LSI (Large Scale Integration)), etc. Each process disclosed in the form may be realized.
  • circuits may be realized by one or a plurality of integrated circuits, and a plurality of processes shown in each embodiment may be realized by one integrated circuit.
  • LSI may be referred to as VLSI, super LSI, ultra LSI, or the like depending on the degree of integration.
  • the memory 102 temporarily stores the program loaded from the storage 103 and provides a work area to the processor 101. Various data generated while the processor 101 is executing the program are also temporarily stored in the memory 102.
  • the memory 102 includes, for example, a RAM (RandomAccessMemory), a ROM (ReadOnlyMemory), and the like.
  • the storage 103 stores the program.
  • the storage 103 includes, for example, an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, and the like.
  • the communication I / F 105 is implemented as hardware such as a network adapter, communication software, and a combination thereof, and transmits / receives various data via the network 300.
  • the communication may be executed by wire or wirelessly, and any communication protocol may be used as long as mutual communication can be executed.
  • the communication I / F 105 executes communication with another information processing device such as a user terminal via the network 300.
  • the communication I / F 105 transmits various data to other information processing devices according to instructions from the processor 101. Further, the communication I / F 105 receives various data transmitted from other information processing devices and transmits them to the processor 101.
  • the input / output I / F 104 includes an input device for inputting various operations to the document information evaluation device 100 and an output device for outputting the processing result processed by the document information evaluation device 100.
  • the input / output I / F 104 may be integrated with the input device and the output device, or may be separated into the input device and the output device.
  • the input device is realized by any one of all kinds of devices capable of receiving an input from a user and transmitting information related to the input to the processor 101, or a combination thereof.
  • the input device includes, for example, a hardware key such as a touch panel, a touch display, and a keyboard, a pointing device such as a mouse, a camera (operation input via an image), and a microphone (operation input by voice).
  • the output device outputs the processing result processed by the processor 101.
  • the output device includes, for example, a touch panel, a speaker, and the like.
  • each functional unit shown in FIG. 1 is not essential, and other functional units may be provided. Further, the functions or processes of each functional unit may be realized by machine learning or AI (Artificial Intelligence) to the extent feasible.
  • AI Artificial Intelligence
  • the user terminal 200 includes a communication control unit 210, a display control unit 220, an input control unit 230, and a storage control unit 240.
  • the communication control unit 210 causes the user terminal 200 to transmit and receive various information between the user terminal 200 and the external device (document information evaluation device 100) via the network 300.
  • the display control unit 220 controls the display of data on a display screen such as a display or a touch panel.
  • the input control unit 230 receives an input operation from the user via the keyboard, touch panel, or microphone.
  • the input control unit 230 includes a document information input unit 231, a search formula change unit 232, and a selection input unit 233.
  • the document information input unit 231 accepts input of document information to be evaluated from the user.
  • the search expression change unit 232 receives an operation input related to the change of the search expression from the user.
  • the selection input unit 233 receives from the user an operation input related to the selection of the item related to the selection of the item in the preceding document information. Further, the selection input unit 233 receives an operation input related to the selection of the search expression from the user. The operation input of these users in the user terminal 200 will be described later.
  • the storage control unit 240 stores a program and user information for operating the user terminal 200 on the document information evaluation system 500. Further, the storage control unit 240 may store the downloaded data of the evaluation result of the document information using the document information evaluation device 100.
  • the document information evaluation device 100 is a device for connecting to a user terminal 200 via a network 300 and providing the service of the document information evaluation system 500 to the user terminal 200.
  • the document information evaluation device 100 may be, for example, a so-called server device or a computer (for example, a desktop, a laptop, a tablet, etc.), but is not limited thereto.
  • the document information evaluation device 100 may be a device realized by one housing or a system realized by a plurality of devices connected via a network or the like.
  • the document information evaluation device 100 may realize a part or all of its functions by a virtual device such as a cloud service provided by a cloud computing system. That is, the document information evaluation device 100 may realize at least one or more of the functional units described below in another device.
  • the document information evaluation device 100 has each functional unit of an acquisition unit 110, a generation unit 120, a calculation unit 130, an extraction unit 140, an output unit 150, a communication control unit 160, and a storage control unit 170.
  • Each of the functional units of the document information evaluation device 100 in the present embodiment will be described as a functional module realized by an information processing program (software) that controls the document information evaluation device 100.
  • the document information evaluation program operates on the document information evaluation device 100. That is, the document information evaluation device 100 refers to a device on which the document information evaluation program operates.
  • the communication control unit 160 transmits and receives various information between the document information evaluation device 100 and the external device (user terminal 200) via the network 300.
  • the storage control unit 170 stores an information processing program (software) that controls the document information evaluation device 100. Further, the memory control unit 170 may include a synonym database (DB) 171.
  • DB synonym database
  • the acquisition unit 110 acquires the document information input by the user terminal 200 from the user terminal 200.
  • the generation unit 120 decomposes the document information acquired from the user terminal 200 into predetermined constituent units (Elements), and generates a search expression for each constituent unit based on the keywords included in the constituent units as a search query.
  • the composition unit is each part that constitutes the document information, and when the document information is composed of a plurality of sentences, the composition unit is a sentence segmented by punctuation marks, by a certain length, or by a predicate. It's okay.
  • the decomposition of the structural unit may be performed based on the context or the sentence structure.
  • the generation unit 120 may extract the parallel structure of the document from the dependency relation of the clauses constituting the document by morphological analysis, syntactic analysis, or the like, and obtain the constituent unit.
  • the user wants to evaluate the similarity of his / her invention with the preceding document, he / she can send a document listing the constituent requirements of the invention to the document information evaluation device 100 as document information.
  • the division of the constituent unit may be performed for each constituent requirement.
  • the keyword included in the constituent unit refers to a word or a synonym thereof included in the sentence as the constituent unit.
  • FIG. 3 is an example of a display screen (search expression generation screen 60) of the user terminal 200 using the document information evaluation system 500 according to the embodiment of the present invention.
  • the user inputs the document information to be evaluated in the document information input area 61 of the search expression generation screen 60.
  • the sentence "an automatic athletic meet holding system that holds an athletic meet when it is fine according to today's weather and notifies the participants by a predetermined notification method" is input by the user.
  • the document information may be input directly to the document information input area (text box) 61 as described above, or may be input by selecting a path to the document to be evaluated. May be.
  • the search expression generation button 62 is selected by the user, the document information input unit 231 of the user terminal 200 receives the document information input to the document information input area 61, and the communication control unit 210 evaluates the document information. It is transmitted to the device 100.
  • the generation unit 120 of the document information evaluation device 100 decomposes the document information acquired from the user terminal 200 into predetermined structural units.
  • the constituent units are (i) “according to today's weather”, (ii) “conduct an athletic meet when it is sunny”, (iii) “notify participants by a predetermined notification method” and (iv). ) "Automatic athletic meet holding system”.
  • the generation unit 120 generates a search expression for each configuration unit based on the keywords included in the configuration unit as a search query.
  • the keywords included in the constituent unit are “today (tomorrow, today)”, “weather (weather)”, and “hence” for the constituent unit (i).
  • synonyms For the extraction of keywords, existing methods such as morphological analysis and N-gram may be used.
  • the numbers in parentheses (brackets) refer to synonyms for words included in the constituent units.
  • the synonyms may be obtained from the existing synonym database 171 or from a corpus (not shown) via the network 300.
  • the generation unit 120 generates "2 * [today (tomorrow, today) ⁇ 3> + weather (climate) ⁇ 3> + therefore ⁇ 1>]" as a search formula for the constituent unit (i).
  • the integer "2" at the beginning is the minimum number of keywords included (matching) in the preceding document information among the keywords included in square brackets (square brackets). This minimum number is set according to the length of the structural unit.
  • the numerical value in the angle brackets after the keyword is the weight of the keyword to which the angle brackets are added, and is used when calculating the score indicating the degree of matching between the preceding document information and the document information.
  • the weight may be set by a method such as TF-IDF, Okapi BM25, etc., which determines the importance of the words contained in the document.
  • the output unit 150 outputs the search expression information for displaying the search expression generated by the generation unit 120 for each configuration unit to the user terminal 200. As shown in FIG. 3, the generated search expression is displayed in the search expression input area 63 for each of the structural units (i) to (iv) on the search expression generation screen 60 of the user terminal 200.
  • the search formula generation method is not limited to the above.
  • the calculation unit 130 determines the degree of matching with a plurality of preceding document information stored in a predetermined storage unit (preceding document information database 400) for each configuration unit. Calculate as a score (the score calculation process will be described later).
  • the extraction unit 140 extracts the preceding document information (hereinafter, also referred to as “similar document information”) whose score calculated by the calculation unit 130 satisfies a predetermined criterion from the preceding document information database 400.
  • the output unit 150 outputs the extracted preceding document information (that is, similar document information) to the user terminal 200.
  • FIG. 4 shows an example of the display screen of the user terminal 200 of the extracted similar document information.
  • the evaluation result screen 70 includes a list area 71 for similar document information.
  • the number of similar documents to be extracted may be a predetermined number (for example, 5) in descending order of similarity (in descending order of matching score), or a predetermined threshold value may be set in advance.
  • Similar document information with a score equal to or higher than a predetermined threshold value may be extracted.
  • the similar document information in the list area 71 is selected by the user, the details of the selected similar document information are displayed in the detailed display area 72.
  • the extracted similar document information may be downloaded to the user terminal 200 at the user's option.
  • the calculation unit 130 creates a kNN graph (S101).
  • the kNN graph is created by the following procedure.
  • the calculation unit 130 vectorizes all the document information acquired from the user terminal 200 by the acquisition unit 110 and the sentences included in the preceding document information stored in the preceding document information database 400.
  • Vectorization may be performed by conventional techniques such as Word2Vec, Doc2Vec (Paragraph2Vec), LDA (Lantent Dirichlet Allocation) or NTSG (Neural enzyme Skip Gram), Bag of Words and the like.
  • the calculation unit 130 creates a distance matrix between sentences from a vector, uses each sentence as a vertex, and stretches edges from each character data to k sentences having a short distance from each character data.
  • the kNN graph is created by the above procedure. Although described above as a sentence, it may be a combination of a plurality of phrases, a phrase, or a word.
  • the calculation unit 130 sets similar document information as an output target for the document information as an evaluation target acquired from the acquisition unit 110 in all the sentences included in the preceding document information (S102), and the extraction unit 140.
  • the extraction may be performed by a conventional technique such as ElasticSearch (registered trademark).
  • the extraction unit 140 sets the sentence with the highest score as the start point as a result of extraction (S104), adds the start point to the final output result (S105), and repeats until the final output result becomes n or more (S106). If the number of cases is less than n, the process proceeds to S107, and if the number of cases is n or more, the process proceeds to S110.
  • the calculation unit 130 extracts candidates for the query conversion rule (S107).
  • Candidates for query conversion rules are extracted by the following procedure. First, based on the created kNN graph, a sentence similar to the sentence set as the starting point is extracted. Then, in the sentence set as the starting point and the extracted similar sentence, the words recognized as having high importance are extracted. The degree of importance may be determined by a conventional technique such as the TF-IDF method. For the extracted word, the adjacent word is acquired in the sentence set as the starting point and the extracted similar sentence. For example, "file” and "processing" when the extracted word is "distributed” and the sentence is "distributed processing in a distributed file system".
  • the calculation unit 130 applies a conversion rule having a high score to the query (S108).
  • the number of conversion rules may be one or a plurality, and the number may be controlled by the calculation unit 130. Further, the number of new queries calculated by the conversion rule may be controlled by the calculation unit 130 based on the evaluation information of the user with respect to the evaluation result of the document information representing the evaluation target acquired by the acquisition unit 110. ..
  • the score can be calculated by the following formula.
  • the sentence set as the start point is A
  • the adjacent word acquired in the sentence set as the start point is w1
  • the extracted similar sentence is B
  • the adjacent word acquired in the extracted similar sentence is w2
  • P (w, X) be the probability of appearance of the word w in the sentence X.
  • Simularity is an index of the semantic closeness of words, and the larger this value is, the more semantically similar the two words are.
  • the similarity can be a value calculated by nltk, which is a package of Python, based on the path length of WordNet.
  • the calculation unit 130 sets a sentence adjacent to the start point as the next output target (S109), and outputs again using the query newly calculated by S108 (returns to S103).
  • the extraction unit 140 may extract the document information including the sentence that is the starting point as the final output result.
  • the search formula used for the search is automatically generated only by a simple operation of inputting the document information as the evaluation target, and the document information is evaluated (search for similar documents). ) Is performed. Therefore, it is possible to provide a document information evaluation system that is easy for the user to use. Further, since the generated search expression is presented to the user, the user can confirm in advance what kind of search expression is used for the search, and can confirm the validity of the search.
  • the user can change (edit) the search expression displayed on the user terminal 200.
  • the user can change the search expression displayed in each text box 65.
  • the search expression change unit 232 accepts the change of the search expression as described above, and when the search button 66 is selected by the user, the information regarding the changed search expression (that is, the information regarding the operation input related to the change of the search expression). Is output to the document information evaluation device 100.
  • the acquisition unit 110 of the document information evaluation device 100 acquires information regarding the operation input related to the change of the search expression in the user terminal 200, and the extraction unit 140 uses the search expression changed by the operation input as a search query to determine. Extract prior document information that meets the criteria. That is, the score calculation process described above is performed using the changed search query.
  • search history for each search query may be associated with the user and stored in the user terminal 200 or the document information evaluation device 100. Further, when there is an error in the search expression edited by the user, the search expression changing unit 232 may notify the user by displaying the search expression in red or displaying a pop-up indicating that there is an error.
  • the user can change the search expression generated by the document information evaluation device 100. Therefore, the user can set a more appropriate search expression while examining the search result for each search expression.
  • the prior document information is composed of a plurality of items such as claims, detailed description of the invention, abstract, and drawings.
  • the user can select an item to be searched by a search formula from a plurality of items of prior document information.
  • the output unit 150 outputs the search expression information that displays a check box in which a plurality of items in the preceding document information for calculating the degree of matching with the document information can be selected together with the search expression for each configuration unit.
  • the display control unit 220 displays a check box 64 in which an item can be selected, as in the search expression generation screen 60 of FIG. In the example of FIG.
  • a check box 64 in which "claims", “detailed description of the invention”, “full text”, and “summary” can be selected as items is displayed, but the items are limited to these. is not it.
  • the selection input unit 233 receives the selection input by the user, the selection input unit 233 outputs the selection input to the document information evaluation device 100.
  • the acquisition unit 110 of the document information evaluation device 100 acquires item selection information related to item selection in the preceding document information from the user terminal.
  • the document information evaluation device 100 performs a similar document information extraction process based on the user's selection described above.
  • a check box in which an item to be searched by a search formula can be selected from a plurality of items of prior document information is displayed on the user terminal. Since the selection is made by a check box, it is not necessary to input the same search expression multiple times for each item, unlike the case where the item is selected by the pull-down method, for example. Further, since the item to be searched can be specified for each search expression, it becomes possible to easily perform the search according to the purpose of the user.
  • the user can select a search expression to be used as a search query from the constituent units.
  • the output unit 150 of the document information evaluation device 100 outputs the search expression information that displays the search expression used as the search query in a selectable manner. Based on the search expression information, in the example of FIG. 3, on the search expression generation screen 60, the user can select the search expression to be used for the search query by the check box 67.
  • the acquisition unit 110 acquires the search expression selection information regarding the selection of the search expression from the user terminal 200, and the extraction unit 140 uses the selected search expression as a search query based on the search expression selection information and satisfies a predetermined criterion. Extract the preceding document information.
  • the user can easily select the generated search expression to be used as the search query. So, for example, if you want to evaluate a structural unit that the user thinks is more characteristic, just select it with a check box without deleting other search expressions or re-entering the document information. It is very convenient because it can be used.
  • FIG. 5 shows an example of a display screen of a comparison table in the user terminal 200.
  • the comparison table display screen 80 has a structural unit display area 82, a preceding document information and similarity display area 83, and a determination result display area 84 for determining the patentability of the invention related to the document information input by the user. include.
  • the preceding document information and the similarity display area 83 include similar documents 83A to 83C.
  • the preceding document information and the similarity display area 83 may be configured so that the preceding document information can be viewed according to the user's selection.
  • the evaluation result of the document information is not limited to the above-mentioned comparison table.
  • it may be a mock notice of reasons for refusal (a mock notice resembling a notice of reasons for refusal), information on intellectual property related as an inventor or an applicant, and the like.
  • the information regarding the intellectual property related to the inventor or the applicant is the invention memo or claim information in which the invention information is described.
  • information about similar documents is provided to the user in the form of a comparison table for each constituent unit. Therefore, the user can immediately determine the configuration requirements having a high degree of matching with similar documents, and can provide a system with higher usability.
  • the acquisition unit 110 acquires the document information input by the user terminal 200 from the user terminal 200 (step S11).
  • the generation unit 120h decomposes the document information into predetermined constituent units, and generates a search formula for each constituent unit based on the keywords included in the constituent units as a search query (step S12).
  • the output unit 150 outputs the search expression information for displaying the search expression generated by the generation unit 120 for each configuration unit to the user terminal 200 (step S13).
  • the acquisition unit 110 determines whether or not a search request has been acquired from the user terminal 200 (step S14).
  • the search request is transmitted from the user terminal 200, for example, when the search button 66 is selected on the search expression generation screen 60 of FIG. If the search request cannot be acquired, the process waits in step S14.
  • the preceding document information whose score indicating the degree of matching with the preceding document information calculated for each constituent unit meets a predetermined criterion is extracted (step S15). After that, the output unit 150 outputs the extracted preceding document information to the user terminal (step S16).
  • each component, each step, etc. can be rearranged so as not to be logically inconsistent, and a plurality of components, steps, etc. can be combined or divided into one. Is. Further, the configurations shown in the above embodiments may be appropriately combined.
  • each component described as being included in the document information evaluation device 100 may be physically distributed by a plurality of computers, or may be realized as one computer.
  • the calculation of the degree of matching (similarity) between the document information and the preceding document information is not limited to the above method.
  • the calculation unit 130 lowers the word included in the constituent unit between the document information transmitted from the user terminal 200 and the preceding document information by the word corpus dictionary stored in advance in the storage control unit 170.
  • a concept or a superordinate concept may be determined and the similarity may be calculated.
  • the word included in the document information input by the user is the same as the word included in the preceding document information, or if it is a subordinate concept, the preceding document information and the document information transmitted from the user terminal 200 are used. You may calculate the degree of similarity of.
  • the method for calculating the degree of similarity is not limited to the above-mentioned method, and an existing clustering method can be used.
  • the document information evaluation device 100 may be applied to the intellectual property creation support system.
  • the document information evaluation device 100 may include a determination unit (not shown) for determining the possibility of acquiring a right.
  • the determination unit (not shown) can search for a similar invention similar to the invention by the user, and execute, for example, a process of determining the possibility of acquiring a right depending on the presence or absence of the similar invention.
  • a keyword is extracted from the words included in the invention, and a synonym or the like for the keyword is searched from a database (not shown) that stores synonyms, synonyms or derivative words (synonyms, etc.).
  • the judgment unit has, for example, "S rank (very likely)", “A rank (highly likely)", and “B rank (possible)” according to the high or low possibility of acquiring the right. Judgment by rank may be made, such as "with sex)" and "C rank (less likely)". Further, the determination is not limited to the display from S rank to C rank. The determination may be, for example, displayed from ⁇ to ⁇ in descending order of probability.
  • the judgment unit can judge the possibility of acquisition of rights based on the examination results of acquisition of rights that have been examined in the past by the patent offices of each country.
  • the examination result of acquisition of rights is the invention related to the application, the cited document, and the examination result (whether or not it was rejected based on the cited document) in comparison between the two.
  • the determination unit calculates the similarity between the invention of the application and the text of the cited document, learns the comparison between the calculated similarity and the examination result, and determines the possibility of acquiring the right. You may.
  • the judgment unit can use the judgments made by the JPO in the past as the judgment criteria by learning the comparison between the calculated similarity and the past examination results. The determination accuracy can be improved.
  • the storage control unit 170 may be configured to store the examination result in advance.
  • the examination result can be obtained, for example, from the examination information published by the patent offices of each country.
  • the determination unit may determine the possibility of acquiring the right based on the examination result.
  • the judgment unit may machine-learn the past examination results and judge the possibility of acquiring the right.
  • the determination unit performs machine learning (supervised learning) using the input and output as a data set, inputting the invention related to the application and the cited document, and using the examination result as an output.
  • machine learning supervised learning
  • the determination unit can improve the accuracy of determination regarding the possibility of acquiring rights by using the learning results learned in each modeling.
  • the judgment unit acquires rights in response to changes in the tendency of examination at the JPO by machine learning new examination results. It is possible to judge the possibility.
  • machine learning a learning technique with a teacher or a learning technique without a teacher may be used.
  • the learning technique of machine learning for example, a neural network (including deep learning), a support vector machine, clustering (for example, a task, a first embodiment, etc.), a Bayesian network, or the like may be used.
  • the display screen of the user terminal 200 described above is an example, and is not limited to this.
  • the search expression generation screen 60 may have an area for inputting or selecting information about an applicant and a patent classification (IPC). As a result, the search range can be limited and noise can be reduced.
  • the program of each embodiment of the present disclosure may be provided in a state of being stored in a storage medium readable by a computer.
  • the storage medium can store the program in a "non-temporary tangible medium".
  • Programs include, for example, software programs and computer programs.
  • the storage medium may be one or more semiconductor-based or other integrated circuits (ICs) (eg, field programmable gate arrays (FPGAs), application-specific ICs (ASICs), etc.), hard disks.
  • ICs semiconductor-based or other integrated circuits
  • the storage medium may be volatile, non-volatile, or a combination of volatile and non-volatile, where appropriate.
  • the program of the present disclosure may be provided to the information processing apparatus via an arbitrary transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program.
  • an arbitrary transmission medium communication network, broadcast wave, etc.
  • each embodiment of the present disclosure can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.
  • the program of the present disclosure is implemented using, for example, a script language such as JavaScript (registered trademark), Python, C language, Go language, Swift, Kotlin, Java (registered trademark), and the like.
  • a script language such as JavaScript (registered trademark), Python, C language, Go language, Swift, Kotlin, Java (registered trademark), and the like.
  • Document information evaluation device 101 Processor 102 Memory 103 Storage 110 Acquisition unit 120 Generation unit 130 Calculation unit 140 Extraction unit 150 Output unit 160 Communication control unit 170 Storage control unit 171 Synonyms database 200 User terminal 210 Communication control unit 220 Display control unit 230 Input Control unit 231 Document information input unit 232 Search formula change unit 233 Selective input unit 300 Network 400 Preceding document information database 500 Document information evaluation system

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un dispositif d'évaluation d'informations de documents et similaire permettant de réduire les omissions de recherche par rapport à une condition de recherche. Un dispositif d'évaluation d'informations de documents selon un mode de réalisation de la présente invention comprend : une unité d'acquisition destinée à acquérir, à partir d'un terminal utilisateur, des informations de documents qui ont été entrées par l'intermédiaire du terminal utilisateur ; une unité de génération destinée à décomposer les informations de documents en unités constitutives prescrites et générer une formule de recherche pour chaque unité constitutive et sur la base d'un mot-clé inclus dans l'unité constitutive, en tant qu'interrogation de recherche ; une unité d'extraction destinée à calculer, en tant que score pour chaque unité constitutive et sur la base de l'interrogation de recherche, un degré de correspondance avec une pluralité d'informations de documents antérieures stockées dans une unité de mémoire prescrite et destinée à extraire des informations de documents antérieures pour lesquelles le score satisfait une référence prescrite ; et une unité de sortie destinée à délivrer, au terminal utilisateur, les informations de document antérieures extraites par l'unité d'extraction. L'unité de sortie délivre, au terminal utilisateur, des informations de formule de recherche pour afficher, pour chaque unité constitutive, la formule de recherche générée par l'unité de génération.
PCT/JP2020/021848 2020-06-02 2020-06-02 Dispositif d'évaluation d'informations de documents, procédé d'évaluation d'informations de documents et programme d'évaluation d'informations de documents WO2021245814A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022529199A JPWO2021245814A1 (fr) 2020-06-02 2020-06-02
PCT/JP2020/021848 WO2021245814A1 (fr) 2020-06-02 2020-06-02 Dispositif d'évaluation d'informations de documents, procédé d'évaluation d'informations de documents et programme d'évaluation d'informations de documents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/021848 WO2021245814A1 (fr) 2020-06-02 2020-06-02 Dispositif d'évaluation d'informations de documents, procédé d'évaluation d'informations de documents et programme d'évaluation d'informations de documents

Publications (1)

Publication Number Publication Date
WO2021245814A1 true WO2021245814A1 (fr) 2021-12-09

Family

ID=78830690

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/021848 WO2021245814A1 (fr) 2020-06-02 2020-06-02 Dispositif d'évaluation d'informations de documents, procédé d'évaluation d'informations de documents et programme d'évaluation d'informations de documents

Country Status (2)

Country Link
JP (1) JPWO2021245814A1 (fr)
WO (1) WO2021245814A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06325093A (ja) * 1993-05-17 1994-11-25 Hitachi Ltd 文書検索方法
JP2001142897A (ja) * 1999-11-15 2001-05-25 Ricoh Co Ltd 文書検索装置、文書検索方法、文書検索システム及び文書検索方法を実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体
JP2002351896A (ja) * 2001-05-29 2002-12-06 Sharp Corp 特許検索装置および特許検索方法
JP2006202159A (ja) * 2005-01-21 2006-08-03 Nec Corp 情報提供システム、情報提供方法及びそのプログラム
JP2015203961A (ja) * 2014-04-14 2015-11-16 株式会社toor 文書抽出システム
JP2018503917A (ja) * 2015-02-02 2018-02-08 アリババ・グループ・ホールディング・リミテッドAlibaba Group Holding Limited キーワードに基づくテキスト検索の方法及び装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06325093A (ja) * 1993-05-17 1994-11-25 Hitachi Ltd 文書検索方法
JP2001142897A (ja) * 1999-11-15 2001-05-25 Ricoh Co Ltd 文書検索装置、文書検索方法、文書検索システム及び文書検索方法を実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体
JP2002351896A (ja) * 2001-05-29 2002-12-06 Sharp Corp 特許検索装置および特許検索方法
JP2006202159A (ja) * 2005-01-21 2006-08-03 Nec Corp 情報提供システム、情報提供方法及びそのプログラム
JP2015203961A (ja) * 2014-04-14 2015-11-16 株式会社toor 文書抽出システム
JP2018503917A (ja) * 2015-02-02 2018-02-08 アリババ・グループ・ホールディング・リミテッドAlibaba Group Holding Limited キーワードに基づくテキスト検索の方法及び装置

Also Published As

Publication number Publication date
JPWO2021245814A1 (fr) 2021-12-09

Similar Documents

Publication Publication Date Title
US11977570B2 (en) Methods, systems, and computer-readable media for semantically enriching content and for semantic navigation
US20240104127A1 (en) Method and system for sentiment analysis of information
US10318564B2 (en) Domain-specific unstructured text retrieval
US20160342578A1 (en) Systems, Methods, and Media for Generating Structured Documents
JP2020123318A (ja) テキスト相関度を確定するための方法、装置、電子機器、コンピュータ可読記憶媒体及びコンピュータプログラム
WO2021189951A1 (fr) Procédé et appareil de recherche de texte, et dispositif informatique et support de stockage
US9122680B2 (en) Information processing apparatus, information processing method, and program
JP6555704B1 (ja) 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム
CA3010817C (fr) Methodes, systemes et support informatique pour l'enrichissement semantique du contenu et la navigation semantique
US20180285448A1 (en) Producing personalized selection of applications for presentation on web-based interface
CN112100396A (zh) 一种数据处理方法和装置
CN108984688B (zh) 母婴知识话题推荐方法及装置
US10885081B2 (en) Systems and methods for contextual ranking of search results
JP7029204B1 (ja) 技術調査支援装置、技術調査支援方法、および技術調査支援プログラム
JP2021086580A (ja) 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム
JP2021086592A (ja) 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム
WO2021245814A1 (fr) Dispositif d'évaluation d'informations de documents, procédé d'évaluation d'informations de documents et programme d'évaluation d'informations de documents
WO2022252806A1 (fr) Procédé et appareil de traitement d'informations, dispositif et support
US20140012854A1 (en) Method or system for semantic categorization
JP7004951B1 (ja) 発明評価装置、発明評価方法、および発明評価プログラム
JP2021128620A (ja) 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム
JP7193890B2 (ja) 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム
JP7323484B2 (ja) 情報処理装置、情報処理方法、及びプログラム
JP2022090289A (ja) 発明評価装置、発明評価方法、および発明評価プログラム
JP2020173759A (ja) 文書情報評価装置および文書情報評価方法並びに文書情報評価プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20938812

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022529199

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20938812

Country of ref document: EP

Kind code of ref document: A1