WO2021019831A1 - Système et procédé de gestion - Google Patents

Système et procédé de gestion Download PDF

Info

Publication number
WO2021019831A1
WO2021019831A1 PCT/JP2020/011838 JP2020011838W WO2021019831A1 WO 2021019831 A1 WO2021019831 A1 WO 2021019831A1 JP 2020011838 W JP2020011838 W JP 2020011838W WO 2021019831 A1 WO2021019831 A1 WO 2021019831A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
server
patent document
search
data
Prior art date
Application number
PCT/JP2020/011838
Other languages
English (en)
Japanese (ja)
Inventor
篤志 久々宇
昌夫 後藤
明紀 関口
隆二 西出
裕一 間野
光司 目黒
忠紀 森口
Original Assignee
特許庁長官が代表する日本国
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=70413819&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=WO2021019831(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by 特許庁長官が代表する日本国 filed Critical 特許庁長官が代表する日本国
Publication of WO2021019831A1 publication Critical patent/WO2021019831A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Definitions

  • This disclosure relates to a management system and a management method for managing patent documents.
  • the JPO uses a database for searching patent documents filed in the past in order to search for prior art at the time of examination of a patent application.
  • Information on patent documents filed in various countries around the world is stored in this search database, and the management system updates the information on the latest patent documents as appropriate.
  • the management system requires a huge amount of time to update the database.
  • Patent Document 1 discloses a recommendation system for presenting recommended books according to a user's taste in a system for managing a library collection.
  • This recommendation system generates a high-evaluation book list and a low-lending frequency book list, and presents the book by extracting a list of books whose book ID is included in both the high-evaluation book list and the low-lending frequency book list. Generate a list of target books.
  • a management system for registering information on patent documents in a search database is required to efficiently manage information on patent documents.
  • the purpose of the management system and management method is to efficiently manage information related to patent documents.
  • the management system is a management system for registering information on a plurality of patent documents in a search database, and collects data from a database owned by a patent office in the home country and a patent office in a plurality of other countries in a predetermined period.
  • the format information including at least the document number and the data that can uniquely identify the patent documents and the content information including at least the contents of the invention of each patent document are provided.
  • the first server that acquires format information for each patent document and the bibliographic information that includes data whose data format differs from country to country for patent documents filed or registered with patent offices in multiple countries.
  • the second server that acquires the document number and the bibliographic information converted into the common data format and the first server obtain the format information from the external database that has been converted into the common data format and stored.
  • each data included in the formal information is stored at a corresponding position in the first table in which each data item of the formal information and the bibliographic information is arranged in a predetermined order. 2
  • the format information and the bibliographic information are integrated by storing each data included in the bibliographic information at the corresponding position in the first table using the document number as a key.
  • the document number and content information of each patent document are acquired from the third server that generates integrated information and the data group, and the patent document in which the content of the invention is described in a predetermined language is included in the content of the invention. Based on this, for patent documents in which the content of the invention is not described in a predetermined language, classification information of each patent document is generated using a learning model based on the translated text in which the content of the invention is translated into a predetermined language.
  • the third server has a server, and the third server uses the document number as a key for the patent documents for which the first server has acquired the format information after the first integrated information is completed, and the data included in the first integrated information.
  • the first integrated information and the classification information are arranged in a predetermined order.
  • the second integrated information is generated, and the generated second integrated information is registered in the search database.
  • the fourth server generates classification information while the third server generates the first integrated information.
  • the fourth server generates a translation of a patent document in which the content of the invention is not described in a predetermined language.
  • the third server stores the second integrated information generated from the patent documents related to the family application in one second table in association with each other, and generates the second integrated information from the patent documents related to the family application. It is preferable to further have a fifth server that sets a common index in the patent documents related to the family application as the index of the second integrated information.
  • the third server uses a learning model to generate technical fields or translation data of each patent document as search information.
  • the fourth server uses the classification information generated for the patent document related to the specific application as the classification information of the patent document related to the family application of the specific application.
  • the fourth server further generates a plurality of first feature vectors having different numbers of dimensions for each patent document, and each data of the first integrated information and the classification information is displayed in the second table.
  • the items and a plurality of first feature vectors are arranged in a predetermined order, and the third server uses the document number as a key to obtain each data included in the first integrated information and each data included in the classification information.
  • the fourth server generates a feature vector having a hash value obtained by converting the second feature vector of each patent document as a first feature vector by using a plurality of different LSHs. Is preferable.
  • a plurality of first feature vectors are generated for the designated data designated by the user, and the first feature vectors generated for the designated data in ascending or descending order of the number of dimensions and each patent document. It is preferable to further have a fifth server that extracts patent documents corresponding to the designated data by comparing with the first feature vector generated for.
  • a fifth server that generates a first display data for displaying a plurality of patent documents corresponding to the designated data designated by the user side by side, and a plurality of patents according to the first display data.
  • the fifth server further includes a terminal device for displaying documents side by side, and the fifth server is based on the degree of similarity with the patent documents specified by the user in the terminal device or the patent documents displayed continuously for a predetermined time or longer. It is preferable to generate the second display data in which the patent documents displayed by the first display data are rearranged.
  • the third server outputs information on patent documents for which the second server has not acquired bibliographic information among the patent documents for which the first server has acquired formal information.
  • the first server uses the document number as a key for each data included in the second integrated information for the patent documents for which the first server has acquired the format information after the second integrated information is completed. And the data included in the content information, the second integrated information and the content information are stored in the corresponding positions of the third table in which the data items of the second integrated information and the content information are arranged in a predetermined order. It is preferable to generate the third integrated information in which the above are integrated and register the generated third integrated information in the search database.
  • the fourth server stores a learning model for specifying the minor classification of the technical field for each major classification of the technical field, and identifies the major classification of the technical field of each patent document. It is preferable to specify the minor classification of the technical field of each patent document by using the learning model corresponding to the specified major classification, and generate the major classification and the minor classification of the technical field of each patent document as classification information.
  • the management method is a management method in a management system for registering information on a plurality of patent documents in a search database, in which a first server is used by a patent office in the home country and a plurality of other countries within a predetermined period.
  • a first server is used by a patent office in the home country and a plurality of other countries within a predetermined period.
  • format information including at least a document number and data that can uniquely identify the patent document, and at least an invention of each patent document.
  • the format information is acquired for each patent document from the data group including the content information including the content of, and the second server obtains the format information for each patent document from the data group, including the document number and different from the format information.
  • the data included in the format information is arranged in a predetermined order, and the data items of the format information and the bibliographic information are arranged in a predetermined order.
  • each data included in the bibliographic information is used as the key for the bibliographic information in the first table.
  • the fourth server acquires the document number and the content information for each patent document from the data group, and the invention Patent documents whose contents are described in a predetermined language are based on the contents of the invention, and patent documents whose contents are not described in a predetermined language are based on a translation in which the contents of the invention are described in a predetermined language.
  • the third server uses the learning model to generate classification information for each patent document, and the third server keys the document number for the patent document for which the first server has acquired the format information after the first integrated information is completed.
  • each data included in the first integrated information and each data included in the classification information are placed at corresponding positions in the second table in which the data items of the first integrated information and the classification information are arranged in a predetermined order.
  • the second integrated information in which the first integrated information and the classification information are integrated is generated, and the generated second integrated information is registered in the search database.
  • the management system is a management system for registering information on a plurality of patent documents in a search database, and collects data from a database owned by a patent office in the home country and a patent office in a plurality of other countries in a predetermined period.
  • the format information including at least the document number and the data that can uniquely identify the patent documents and the content information including at least the contents of the invention of each patent document are provided.
  • the first server that acquires format information for each patent document and the bibliographic information that includes data whose data format differs from country to country for patent documents filed or registered with patent offices in multiple countries.
  • the second server that acquires the document number and the bibliographic information converted into the common data format and the first server obtain the format information from the external database that has been converted into the common data format and stored.
  • each data included in the formal information is stored in the corresponding position of the first table in which each data item of the formal information and the bibliographic information is arranged in a predetermined order, so that the formal information and the bibliography are stored.
  • the third server that generates the first integrated information that integrates the information, the document number and the content information for each patent document are acquired from the data group, and the search information of each patent document is searched using the learning model based on the content information.
  • the third server uses the document number as a key for the patent document for which the first server has acquired the format information after the first integrated information is completed, and the third server uses the document number as a key to generate the first integrated information.
  • the management method is a management method in a management system for registering information on a plurality of patent documents in a search database, in which a first server is used by a patent office in the home country and a plurality of other countries within a predetermined period.
  • format information including at least a document number and data that can uniquely identify the patent document, and at least an invention of each patent document.
  • format information is acquired from the data group including the content information including the content of, and the second server obtains data for each country about the patent documents applied for or registered with the patent offices of multiple countries.
  • the document number and the bibliographic information converted into the common data format are acquired from the external database in which the bibliographic information including the data having different formats is converted into the common data format and stored.
  • the first server has acquired the format information
  • the third server uses the document number as a key for the patent documents for which the first server has acquired the format information after the first integrated information is completed, including generating the search information of each patent document using the learning model. , Each data included in the first integrated information and each data included in the search information generated before the first integrated information is completed, each data item of the first integrated information and the search information is in a predetermined order. By storing in the corresponding positions of the arranged second table, the second integrated information that integrates the first integrated information and the search information generated before the first integrated information is completed is generated and generated. The second integrated information is registered in the search database.
  • the management system and management method can efficiently manage information related to patent documents.
  • FIGS. 1 to 3 are schematic views for explaining an example of processing by the management system 1 according to the embodiment.
  • the management system 1 is a system for registering information on a plurality of patent documents in a search database.
  • Each patent document is a document relating to a patent application or patent registration, is structured based on a predetermined document format, and includes bibliographic information and content information.
  • the predetermined document format is a storage address in which the patent document is stored, a file name given to the patent document, a language in which the patent document is described, or the like.
  • the data format of each patent document differs depending on the country in which the patent application or registration is made.
  • Each patent document is managed by using the format information corresponding to the document format of each patent document.
  • the formal information includes at least the document number and includes data that can uniquely identify the patent document.
  • bibliographic information is information on bibliographic matters such as application number, issue date, filing date, priority information, etc. described in the patent document, and is used to identify each patent document.
  • the content information is the content described in the patent document, and is the scope of claims, the specification, the drawings, the abstract, and the like. That is, the content information includes at least the content of the invention of each patent document.
  • the format information has a common data format in each country, and the bibliographic information and the content information include data having a different data format in each country.
  • the management system 1 includes an inquiry server 100, a bibliographic server 200, a management server 300, an AI (Artificial Intelligence) server 400, and a search server 500.
  • the inquiry server 100 is an example of the first server
  • the bibliographic server 200 is an example of the second server
  • the management server 300 is an example of the third server
  • the AI server 400 is an example of the fourth server.
  • the search server 500 has a search database 600.
  • the search database 600 stores information on existing patent documents collected in the past.
  • the information processing apparatus 15 the first database 16 owned by the patent office of the home country, the second database 17 owned by the patent offices of a plurality of other countries, and the external database 18 are communicated and connected. Each patent document is filed or registered with the patent offices of each country and stored in the first database 16, the second database 17, and the external database 18.
  • the information processing apparatus 15 collects new patent documents applied for or registered in a predetermined period from the first database 16 and the second database 17, and obtains format information and content information of each patent document.
  • the external database 18 collects patent documents applied for or registered in patent offices of a plurality of countries at an arbitrary timing, converts the bibliographic information of each collected patent document into a data format common to each country, and performs each. It is stored in association with the document number of the patent document.
  • the inquiry server 100 acquires format information from the data group 151 of the information processing device 15 for each patent document collected by the information processing device 15 during a predetermined period.
  • the bibliographic server 200 acquires a document number and bibliographic information converted into a common data format for each patent document collected by the information processing apparatus 15 in a predetermined period from the external database 18.
  • the external database 18 is one of the patent documents collected by the information processing apparatus 15 in a predetermined period in order to collect the patent documents applied for or registered in the patent offices of any plurality of countries at an arbitrary timing according to its own intention. It is possible that you do not remember the bibliographic information about the department.
  • the information processing apparatus 15 always stores necessary and sufficient format information of patent documents according to the intention of the patent office of the home country. That is, the information acquired for each patent document and the timing for acquiring each information are different between the information processing apparatus 15 and the external database 18.
  • the management server 300 stores a first table in which each data item of format information and bibliographic information is arranged in a predetermined order for each patent document.
  • the management server 300 acquires the format information from the inquiry server 100 for each patent document for which the inquiry server 100 has acquired the format information, and stores each data included in the format information at the corresponding position in the first table. Further, when the bibliographic server 200 has acquired the bibliographic information for each patent document for which the inquiry server 100 has acquired the format information, the management server 300 first uses the document number as a key for each data included in the bibliographic information. Store in the corresponding position on the table.
  • the management server 300 sets a blank at a position corresponding to the bibliographic information in the first table.
  • the management server 300 generates the first integrated information in which the formal information and the bibliographic information are integrated for each patent document.
  • the bibliographic server 200 collects the patent documents themselves and generates the bibliographic information, the bibliographic server 200 can always acquire the bibliographic information, so that the process of setting a blank at the position corresponding to the bibliographic information in the first table is omitted. May be done.
  • the AI server 400 acquires a document number and content information for each patent document collected by the information processing device 15 in a predetermined period from the data group 151 of the information processing device 15.
  • the AI server 400 is a search that is not described in each patent document and is used for searching based on the bibliographic information or content information of each patent document while the management server 300 is generating the first integrated information.
  • the search information includes machine translations of foreign patent documents, classification information indicating the classification of technical fields (patent classification) of patent documents, keywords representing technical features of inventions disclosed in patent documents, and drawings in patent documents. These are metadata, feature vectors showing features of patent documents, and the like.
  • the AI server 400 has a patent document 11 in which the content of the invention is not described in the language of the home country, that is, the language used by the Japan Patent Office that provides the search system to which the management system 1 is applied. Generates a translation translated into.
  • the AI server 400 is based on the content of the invention for Patent Document 01 in which the content of the invention is described in the language of the home country, and in the translated text generated for Patent Document 11 in which the content of the invention is not described in the language of the home country. Based on this, a learning model is used to generate classification information for each patent document.
  • the management server 300 stores a second table in which each data item of the first integrated information and the search information is arranged in a predetermined order for each patent document.
  • the management server 300 includes each data included in the first integrated information and search information for each patent document for which the inquiry server 100 has acquired the format information after the first integrated information is completed, using the document number as a key. Each data is stored in the corresponding position in the second table.
  • the management server 300 generates the second integrated information in which the first integrated information and the search information are integrated.
  • the management server 300 registers the generated second integrated information in the search database 600 of the search server 500.
  • the inquiry server 100 and the bibliographic server 200 acquire format information from the data group 151 of the information processing device 15 in a short period of time for each patent document collected by the information processing device 15 in a predetermined period, and the bibliographic information from the external database 18. Can be obtained.
  • the AI server 400 since the AI server 400 generates search information that is not described in each patent document, it takes a long time to generate the search information. Without waiting for the completion of the search information, the management server 300 first integrates the format information and the bibliographic information acquired from separate devices to generate the first integrated information, and then integrates the search information into the second integration. Since the information is generated, the second integrated information can be efficiently generated in a short time.
  • the management server 300 collectively generates the second integrated information necessary for registering the plurality of patent documents collected by the information processing apparatus 15 in the search database 600 in the search database 600.
  • the management server 300 can collectively register information on a plurality of patent documents collected by the information processing apparatus 15 in the search database 600, and updates the search database 600 efficiently and in a short time. can do. Therefore, the management system 1 can efficiently manage information related to patent documents.
  • FIG. 4 is a diagram showing a schematic configuration of the management system 1 according to the embodiment.
  • the management system 1 manages each information contained in the patent document, and provides the user with a service for searching the patent document using each information such as a technical classification or a keyword.
  • the management system 1 is applied to, for example, a search system such as a patent information platform (J-PlatPat (registered trademark)), a foreign patent information service (FOPSER), Espacenet (registered trademark), and PATENTSCOPE (registered trademark).
  • J-PlatPat registered trademark
  • FOPSER foreign patent information service
  • Espacenet registered trademark
  • PATENTSCOPE registered trademark
  • the patent information platform includes various publications and CSDB of patents and utility models from Japan, the United States, the European Patent Office (EPO), the United Kingdom, Germany, France, Switzerland, the World Intellectual Property Organization (WIPO), Canada, South Korea, China, etc. It has a database that stores various documents of (Computer Software Data Base).
  • the foreign patent information service has a database that stores various publications of patents and utility models of Russia, Taiwan, Australia, Singapore, Vietnam, Thailand, etc.
  • Espacenet has a database that stores patent gazettes and the like of more than 100 countries provided by the European Patent Office.
  • PATENTSCOPE has a database that stores 71.96 million patent documents, including 3.43 million published PCT international applications.
  • the management system 1 includes an inquiry server 100, a bibliographic server 200, a management server 300, an AI server 400, a search server 500, and the like.
  • the management system 1 further includes a plurality of terminal devices 10, a gateway server 11, a UI (User Interface) server 12, a backup server 13, a log management server 14, an information processing device 15, a first database 16, and a plurality of second databases 17. And has an external database 18 and the like.
  • the plurality of terminal devices 10, the gateway server 11, and the UI server 12 each communicate and connect to the first network 20.
  • the inquiry server 100, journal server 200, management server 300, AI server 400, search server 500, gateway server 11, UI server 12, backup server 13, log management server 14, information processing device 15, and external database 18 are each second.
  • the information processing device 15, the first database 16, and the second database 17 communicate with each other to the third network 22.
  • the first network 20, the second network 21, and the third network 22 are a local area network, a cloud network, or the like provided at the business office of the user of the terminal device 10 and the installation location of each server.
  • the management system 1 may have a plurality of each server. Each server is a physical server. In addition, each server may be integrated, and each server may be a virtual server.
  • Each terminal device 10 is a personal computer, a tablet terminal, a smartphone, or the like, and is used by a user who searches for patent documents.
  • Each terminal device 10 has a display device, an input device, a storage device, a memory, a CPU, a communication interface circuit, and the like.
  • the gateway server 11 is a server that relays communication between the search server 500 and each terminal device 10, and instructs the search server 500 to search for patent documents according to instructions from each terminal device 10, and the search result from the search server 500. Is received and transmitted to each terminal device 10.
  • the UI server 12 is a server that provides a search screen for searching patent documents, and transmits display data for displaying the search screen to each terminal device 10 according to an instruction from each terminal device 10.
  • the backup server 13 is a server that periodically backs up each information stored in the management server 300 and the search server 500.
  • the backup timing is preferably, but not limited to, before the start of updating the data stored in the database 16.
  • the management system 1 can restore each server based on the information stored in the backup server 13, and can improve the continuity of the service.
  • the log management server 14 monitors the operating status of the inquiry server 100 and the search server 500, the stored data of the first database 16, and the like according to the instruction from the management server 300, and notifies the server administrator or the user of the monitoring result. It is a server.
  • the log management server 14 monitors the operating state of the server in real time, and notifies the server administrator by using an image illustrated by a graph or a table. As a result, the server administrator can recover the abnormality at an early stage when the abnormality occurs. Further, the log management server 14 confirms the stored data of the first database 16 before and after the data update in the first database 16, and notifies the server administrator and the user. As a result, the server administrator and the user can prevent the data from being unable to be updated due to insufficient free space in the storage device of the first database 16.
  • the log management server 14 periodically collects information on newly added patent documents from the inquiry server 100, the management server 300, or the search server 500, and collects the collected information for each issue year or each issuing institution. Tally.
  • the log management server 14 notifies the system administrator or the user of the aggregated information by using an image illustrated by a graph or a table. As a result, the system administrator or the user can grasp the distribution of the number of patent documents for each issue year or each issuing institution, and the log management server 14 can improve the convenience of the user. It becomes.
  • the information processing apparatus 15 periodically collects newly applied or registered patent documents from the first database 16 and the second database 17 and distributes them to the inquiry server 100 and the AI server 400.
  • the information processing apparatus 15 transmits a new application, registration, or update acquisition request signal of a patent document to the first database 16 and the second database 17 via the third network 22 at predetermined periods. To do.
  • the information processing apparatus 15 stores the format information and the content information of each received patent document as a data group 151. That is, the data group 151 includes format information and content information about a plurality of patent documents collected by the information processing apparatus 15 from the first database 16 and the second database 17 in a predetermined period, respectively. Further, the information processing apparatus 15 transmits the received format information and content information of each patent document to the inquiry server 100 and the AI server 400 via the second network 21.
  • the information processing apparatus 15 determines whether or not a family application exists in each acquired patent document, and if a family application exists, the patent documents related to the family application are stored in the first database 16 and the second database 17. Get from. In addition, the information processing apparatus 15 also acquires information indicating the language in which each patent document is described, and the latest update date of each patent document, as well as each patent document. The information processing apparatus 15 stores the acquired format information and content information of each patent document as a data group 151, and transmits the acquired format information to the inquiry server 100 and the AI server 400. The information processing device 15 is not connected to the second network 21 by communication, and the administrator of the information processing device 15 receives the format information and contents of each patent document received by using a USB (Universal Serial Bus) memory or the like. The information may be copied to the inquiry server 100 and the AI server 400.
  • USB Universal Serial Bus
  • the first database 16 is a database owned by the Japan Patent Office (target patent office), and stores, for example, patent documents applied for or registered at the Japan Patent Office (JPO).
  • the number of the first database 16 is not limited to one, and may be plural.
  • the plurality of second databases 17 are databases owned by a plurality of patent offices of other countries (patent offices other than the target patent office).
  • Each second database 17 contains, for example, the US Patent and Trademark Office (USPTO), the European Patent Office (EPO), the World Intellectual Property Organization (WIPO), the Chinese Patent Office (SIPO), the German Patent and Trademark Office (DPMA), and the Korean Patent.
  • USPTO US Patent and Trademark Office
  • EPO European Patent Office
  • WIPO World Intellectual Property Organization
  • SIPO Chinese Patent Office
  • DPMA German Patent and Trademark Office
  • KIPO Korean Patent.
  • the number of the second database 17 may be one.
  • the external database 18 is a database different from the first database 16 and the second database 17.
  • the external database 18 is, for example, a DocDB (Document Database) managed by the European Patent Office (EPO).
  • the external database 18 collects patent documents applied for or registered with patent offices in a plurality of countries at an arbitrary timing, converts the bibliographic information of each collected patent document into a data format common to each country, and stores it. That is, in the external database 18, bibliographic information including data having different data formats for each country is converted into a common data format and stored in the patent documents filed or registered with the patent offices of a plurality of countries.
  • the number of external databases 18 is not limited to one, and may be plural.
  • FIG. 5 is a diagram showing a schematic configuration of the inquiry server 100.
  • the inquiry server 100 acquires and stores text data and image data of each patent document issued by each country's patent office including the target patent office, and acquires format information according to the format of each patent document.
  • the inquiry server 100 receives the inquiry request signal for which the document number is designated by the user from the terminal device 10, the inquiry server 100 transmits the text data and the image data of the patent document corresponding to the designated document number to the terminal device 10. Further, the inquiry server 100 provides the search server 500 with various information used for searching patent documents.
  • the inquiry server 100 includes a first communication device 101, a first storage device 110, a first control device 120, and the like.
  • the first communication device 101 has a communication interface circuit for the inquiry server 100 to communicate with each device via the second network 21 according to a predetermined communication protocol.
  • the predetermined communication protocol is TCP / IP (Transmission Control Protocol / Internet Protocol) or the like.
  • the first communication device 101 sends data received from each device via the second network 21 to the first control device 120, and sends data received from the first control device 120 to each device via the second network 21. Send to.
  • the first storage device 110 includes a memory device such as RAM (RandomAccessMemory) and ROM (ReadOnlyMemory), a fixed disk device such as a hard disk, or a portable storage device such as a flexible disk and an optical disk. Further, the first storage device 110 stores computer programs, databases, tables, etc. used for various processes of the inquiry server 100.
  • the computer program may be installed in the first storage device 110 from a computer-readable portable recording medium using a known setup program or the like.
  • the portable recording medium is, for example, a CD-ROM (compact disc read only memory), a DVD-ROM (digital versatile disc read only memory), or the like.
  • the computer program may be installed from a predetermined server or the like.
  • the first control device 120 is a processor such as a CPU (Control Processing Unit) that operates based on a program stored in the first storage device 110 in advance.
  • a DSP digital signal processor
  • a control circuit such as an LSI (large scale integration), an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programming Gate Array) may be used.
  • the first control device 120 is connected to the first communication device 101, the first storage device 110, and the like, controls each of these parts, and manages and controls format information.
  • the first control device 120 reads the computer program stored in the first storage device 110 and operates according to the read computer program, so that the format information generation unit 121, the third integrated information generation unit 122, and the third integrated information transmission are performed. It functions as a unit 123.
  • FIG. 6 is a diagram showing a schematic configuration of the bibliographic server 200.
  • the bibliographic server 200 stores bibliographic information related to bibliographic matters described in patent documents. As shown in FIG. 6, the bibliographic server 200 includes a second communication device 201, a second storage device 210, a second control device 220, and the like.
  • the second communication device 201 is a communication device similar to the first communication device 101, and has a communication interface circuit for the bibliographic server 200 to communicate with each device via a second network 21 according to a predetermined communication protocol.
  • the second communication device 201 sends the data received from each device via the second network 21 to the second control device 220, and the data received from the second control device 220 is sent to each device via the second network 21. Send to.
  • the second storage device 210 is the same storage device as the first storage device 110. Further, the second storage device 210 stores computer programs, databases, tables, etc. used for various processes of the bibliographic server 200.
  • the computer program may be installed in the second storage device 210 by using a known setup program or the like from a computer-readable portable recording medium such as a CD-ROM or a DVD-ROM, or from a predetermined server or the like.
  • the second control device 220 is a control device similar to the first control device 120, and operates based on a program stored in advance in the second storage device 210.
  • a processor such as a CPU, DSP, LSI, ASIC, FPGA, or a control circuit is used.
  • the second control device 220 is connected to the second communication device 201, the second storage device 210, and the like, controls each of these parts, and manages and controls bibliographic information.
  • the second control device 220 reads the computer program stored in the second storage device 210 and operates according to the read computer program, thereby functioning as the bibliographic information generation unit 221.
  • FIG. 7 is a diagram showing a schematic configuration of the management server 300.
  • the management server 300 manages the processing of each server of the inquiry server 100, the journal server 200, the management server 300, the AI server 400, and the search server 500, the data stored in each server, and the communication between the servers.
  • the management server 300 collects information on patent documents and integrates them so that users can search them.
  • the management server 300 includes a third communication device 301, a third storage device 310, a third control device 320, and the like.
  • the third communication device 301 is a communication device similar to the first communication device 101, and has a communication interface circuit for the management server 300 to communicate with each device via a second network 21 according to a predetermined communication protocol.
  • the third communication device 301 sends the data received from each device via the second network 21 to the third control device 320, and the data received from the third control device 320 is sent to each device via the second network 21. Send to.
  • the third storage device 310 is the same storage device as the first storage device 110. Further, the third storage device 310 stores computer programs, databases, tables, etc. used for various processes of the management server 300.
  • a computer program is a functional module implemented by software running on a processor. The computer program may be installed in the third storage device 310 by using a known setup program or the like from a computer-readable portable recording medium such as a CD-ROM or a DVD-ROM, or from a predetermined server or the like.
  • the third control device 320 is a control device similar to the first control device 120, and operates based on a program stored in advance in the third storage device 310.
  • a processor such as a CPU, DSP, LSI, ASIC, FPGA, or a control circuit is used.
  • the third control device 320 is connected to the third communication device 301, the third storage device 310, and the like, controls each of these parts, and manages and controls each information related to the patent document.
  • the third control device 320 reads the computer program stored in the third storage device 310 and operates according to the read computer program, thereby causing the first integrated information generation unit 321 and the second integrated information generation unit 322 and the second integration. It functions as an information transmission unit 323.
  • FIG. 8 is a diagram showing a schematic configuration of the AI server 400.
  • the AI server 400 uses AI technology to generate search information estimated from patent documents and store the generated search information. As shown in FIG. 8, the AI server 400 includes a fourth communication device 401, a fourth storage device 410, a fourth control device 420, and the like.
  • the fourth communication device 401 is a communication device similar to the first communication device 101, and has a communication interface circuit for the AI server 400 to communicate with each device via a second network 21 according to a predetermined communication protocol.
  • the fourth communication device 401 sends the data received from each device via the second network 21 to the fourth control device 420, and sends the data received from the fourth control device 420 to each device via the second network 21. Send to.
  • the fourth storage device 410 is the same storage device as the first storage device 110. Further, the fourth storage device 410 stores computer programs, databases, tables, etc. used for various processes of the AI server 400.
  • the computer program may be installed in the fourth storage device 410 using a known setup program or the like from a computer-readable portable recording medium such as a CD-ROM or a DVD-ROM, or from a predetermined server or the like.
  • a first learning model 411 which is a machine translation engine for translating a patent document described in each language into the language used, is provided for each of a plurality of languages different from the language used in the target patent office. It will be remembered.
  • the language used by the target patent office is an example of a predetermined language, and is a language in which patent documents filed with the target patent office are described, such as the native language of the country in which the target patent office is established.
  • a second learning model 412 for specifying a minor classification of the technical field is stored for each major classification of the technical field assigned by the target patent office.
  • the major classification of the technical field is the theme code or the like
  • the minor classification is FI or F term or the like. That is, when the target patent office is the Japan Patent Office, the second learning model 412 for specifying the FI and / or the F term is stored for each theme code.
  • a second learning model 412 for collectively specifying the FI and the F-term may be stored.
  • the target patent office is the US Patent Office or the European Patent Office
  • the minor classification of the technical field is CPC (Cooperative. Patent Classification, European US common patent classification), etc.
  • the major classification is a set of multiple CPCs, etc. Is.
  • FI is a classification of each patent document unique to the Japan Patent Office, which is a subdivision of the IPC (International Patent Classification).
  • IPC International Patent Classification
  • IPC is an internationally unified classification based on the technical content of each patent document, which was created based on the France Agreement on International Patent Classification managed by the World Intellectual Property Organization (WIPO).
  • the theme code is a code assigned to each theme in which each item (about 200,000 items) of FI is summarized in about 2600, and represents the scope of the target technology of each patent document.
  • the F-term is a classification system compiled by the Japan Patent Office based on the technical features of the invention described in each patent document, and is a classification symbol used in the classification system.
  • the F-term classifies patent documents according to multiple technical viewpoints that are different from the IPC and FI patent classification systems.
  • the fourth control device 420 is a control device similar to the first control device 120, and operates based on a program stored in advance in the fourth storage device 410.
  • a processor or control circuit such as a CPU, DSP, LSI, ASIC, or FPGA is used.
  • the fourth control device 420 is connected to the fourth communication device 401, the fourth storage device 410, and the like, controls each of these parts, and manages and controls search information.
  • the fourth control device 420 reads the computer program stored in the fourth storage device 410 and operates according to the read computer program, thereby functioning as the search information generation unit 421.
  • FIG. 9 is a diagram showing a schematic configuration of the search server 500.
  • the search server 500 has a search database 600 used for searching patent documents, and in the search database 600, format information, bibliographic information, search information, content information, etc. of each patent document are collectively collected for each patent document. Manage.
  • the search server 500 searches for patent documents according to the user's instructions received from the terminal device 10 via the gateway server 11, and sends the search results (document numbers of the patent documents, etc.) to the terminal device 10 via the gateway server 11. Send.
  • the search server 500 includes a fifth communication device 501, a fifth storage device 510, a fifth control device 520, and the like.
  • the fifth communication device 501 is a communication device similar to the first communication device 101, and has a communication interface circuit for the search server 500 to communicate with each device via a second network 21 according to a predetermined communication protocol.
  • the fifth communication device 501 sends the data received from each device via the second network 21 to the fifth control device 520, and the data received from the fifth control device 520 is sent to each device via the second network 21. Send to.
  • the fifth storage device 510 is the same storage device as the first storage device 110. Further, the fifth storage device 510 stores computer programs, databases, tables, etc. used for various processes of the search server 500.
  • the computer program may be installed in the fifth storage device 510 using a known setup program or the like from a computer-readable portable recording medium such as a CD-ROM or a DVD-ROM, or from a predetermined server or the like.
  • the fifth storage device 510 is an example of a search database.
  • the management table 511 is stored as data in the fifth storage device 510.
  • the fifth control device 520 is a control device similar to the first control device 120, and operates based on a program stored in advance in the fifth storage device 510.
  • a processor or control circuit such as a CPU, DSP, LSI, ASIC, or FPGA is used.
  • the fifth control device 520 is connected to the fifth communication device 501, the fifth storage device 510, and the like, controls each of these parts, and manages and controls the management table 511.
  • the fifth control device 520 reads the computer program stored in the fifth storage device 510 and operates according to the read computer program, thereby functioning as the third integrated information storage control unit 521 and the search unit 522.
  • 10 to 13 are schematic views showing an example of the data structure of the management table 511.
  • the management table 511 stores the document number, basic information, search information, text data, secondary data, management data, etc. of each patent document for each of the plurality of patent documents.
  • FIGS. 10 to 13 show only tables for two patent documents corresponding to one family application, but the management table 511 shows a set of family applications. Each table is included.
  • the document number is a publication number or registration number of a patent document.
  • the basic information is information based on the format information and bibliographic information of each patent document, and is the issuing institution, language, document type, application number, issue date, filing date, theme code, F-term, FI, IPC, applicant name and invention. Includes personal name, etc.
  • the issuing institution is the Japan Patent Office that issues the patent document.
  • the language is the language in which the patent document is described.
  • the document type is the type of the patent document (public publication, patent publication, etc.).
  • the application number is the application number assigned to the application relating to the patent document.
  • the issue date is the date on which the patent document was issued.
  • the filing date is the date on which the patent document was filed.
  • the applicant's name is the name of the applicant for the application relating to the patent document.
  • the inventor's name is the name of the inventor of the invention described in the patent document.
  • each value is separated by a delimiter such as a comma and stored in one field.
  • a delimiter such as a comma
  • the target patent office is the Japan Patent Office and the classification of technical fields is theme code, F-term, FI and IPC, but the classification of technical fields is shown. , Set according to the classification assigned by the target patent office.
  • the search information is information estimated from each patent document and includes a theme code, F-term, FI, machine translation, translation method, drawing metadata, a plurality of first feature vectors, and the like.
  • the theme code, F-term, and FI are the theme code, F-term, and FI of each patent document estimated using the learning model, respectively.
  • the machine translation is a translation in which the text code of each patent document is translated into the language used by the target patent office using a learning model.
  • the translation method is a method of each machine translation, for example, statistical machine translation (SMT) or neural machine translation (NMT).
  • the search information includes a machine-translated sentence translated by the translation method for each one or a plurality of translation methods.
  • the drawing metadata is the feature information (incidental information) of the drawings of each patent document estimated using the learning model.
  • the first feature vector is a feature vector showing the features of each patent document.
  • the search information may further include keywords and the like of each patent document estimated using the learning model.
  • the text data is text data included in each patent document, and includes the title of the invention, an abstract, the scope of claims, a detailed explanation, the entire text, and the like.
  • the data included in the content information is stored as text data.
  • the secondary data is secondary (incidental) data generated by analyzing the format information, bibliographic information, and text data of each patent document, and includes a family ID, a representative document flag, and the like.
  • the family ID is identification information indicating a patent document corresponding to a family application (a group of applications filed in each country based on the same patent application) related to each patent document.
  • the representative document flag indicates the patent document having the highest priority among the patent documents corresponding to the family application.
  • the management information is information based on the format information of each patent document, and includes the update date, storage address, file name, search server name, inquiry server name, and the like.
  • the renewal date is the latest renewal date of each patent document.
  • the storage address is an address in which the text file of each patent document is stored.
  • the file name is the file name of each patent document.
  • the search server name is the identification information of the server to be accessed when searching each patent document, and is the identification information of the search server that stores the management table of each patent document.
  • the inquiry server name is identification information of the server to be accessed when inquiring each patent document, and is identification information of a search server that stores text data and image data of each patent document.
  • the data type indicates the type of data (character string, numerical value, etc.) stored in the management table 511.
  • the index is an index of each record and is used for searching patent documents. For example, as an index of the theme code, F-term, FI, and IPC, a character string indicating the theme code, F-term, FI, and IPC is set.
  • the index of the applicant name, the inventor name, the translated text, and each text data is set in morpheme units in the case of a language separated by a blank such as English, and in the case of a language not separated by a blank such as Japanese. , N-gram.
  • the value of each element of the first feature vector is set as the index of the first feature vector. Blanks are set for the indexes of other items.
  • a record is an example of a data item.
  • each information is managed for each patent document corresponding to the family application, and a record is set for each patent document.
  • a patent document relating to a Japanese application and a patent document relating to an international application, which is a family application thereof are stored in the same table, and information regarding each patent document is stored in a separate record.
  • a common index is stored in patent documents related to family applications.
  • the theme code, F-term and FI are given to patent documents filed with the Japan Patent Office, while the theme code, F-term and FI are given to patent documents filed with patent offices other than the Japan Patent Office. Not done. Therefore, blanks are set for the theme code, F-term, and FI of the basic information of the patent document filed with the Japan Patent Office other than Japan. Further, for patent documents filed with the Japan Patent Office, the theme code, F-term and FI are not estimated using AI, and blanks are set for the theme code, F-term and FI of the search information. Similarly, for patent documents filed in Japanese, a machine translation is not generated using AI, and a blank is set for the machine translation and translation method of the search information.
  • the search server 500 performs a search in which conditions spanning a plurality of fields are specified. It can be executed by a simple search formula. For example, the search server 500 can easily detect patent documents that satisfy both search conditions even when a keyword, metadata of drawings, and the like are collectively specified as search conditions. Therefore, the search server 500 can search big data efficiently and at high speed.
  • the search server 500 sets the classification of the technical field described in the bibliographic items of the patent document and the classification of the technical field estimated by the AI server 400. It can be collated collectively and can be searched efficiently and at high speed.
  • the search server 500 can collectively collate the original text of the patent document with the machine-translated text generated by the AI server 400, and searches efficiently and at high speed. be able to.
  • the search server 500 can collectively collate the original text of the patent document with the first feature vector generated by the AI server 400 when a keyword or the like is specified as a search condition, and searches efficiently and at high speed. be able to.
  • FIG. 14 shows an example of an operation sequence related to the update process by the management system 1.
  • the operation sequence described below is executed in cooperation with each element of each server mainly by the control device of each server based on the program stored in advance in the storage device of each server of the management system 1. To. This operation sequence is executed at regular intervals (for example, one week).
  • the information processing apparatus 15 obtains format information and content information stored in the data group 151 for a plurality of patent documents collected from the first database 16 and the second database 17 in a predetermined period (for example, the latest one week). It is transmitted to the inquiry server 100 (step S101).
  • the information processing device 15 voluntarily transmits the format information to the inquiry server 100.
  • the information processing device 15 may transmit the format information and the content information to the inquiry server 100 in accordance with the request from the inquiry server 100.
  • the format information generation unit 121 of the inquiry server 100 receives the format information and the content information from the information processing device 15 via the first communication device 101. As a result, the format information generation unit 121 acquires format information from the data group 151 for a plurality of patent documents collected by the information processing apparatus 15 from the first database 16 and the second database 17 in a predetermined period (step S102). ..
  • the format information generation unit 121 collects the patent documents themselves from the information processing device 15, specifies the document format of each patent document for each collected patent document, and format information according to the document format of each patent document. May be obtained by generating. Further, since the format of the document numbers extracted from each patent document differs from country to country, the format information generation unit 121 converts the extracted document numbers into a common format format in the management system 1.
  • FIG. 15 is a schematic diagram showing an example of a data structure of format information.
  • the format information includes the issuing institution, the document number, the document type, the storage address, the file name, the language, the update date, and the like.
  • the storage address and the file name are the address and the file name for storing each patent document in the first storage device 110 of the inquiry server 100.
  • the update date is the date on which each patent document was updated in each database.
  • the format information includes a data type and one or more records for each of the above items.
  • the format information generation unit 121 acquires format information from the data group 151 for the patent document related to the family application, and stores the format information of the patent document related to the family application in one table. And store them in association with each other.
  • the format information generation unit 121 sets a record of format information for each patent document related to a family application in one table.
  • the external database 18 provides the document numbers and bibliographic information converted into a common data format for a plurality of patent documents collected from the first database 16 and the second database 17 by the information processing apparatus 15 in a predetermined period. It is transmitted to the bibliographic server 200 (step S103).
  • the external database 18 transmits the bibliographic number and the bibliographic information to the bibliographic server 200 in accordance with the request from the bibliographic server 200.
  • the external database 18 collects patent documents applied for or registered with patent offices in a plurality of countries at an arbitrary timing, bibliographic information is stored for a part of the patent documents collected by the information processing apparatus 15 in a predetermined period. It may not have been done. Therefore, the external database 18 transmits the bibliographic information to the bibliographic server 200 only for the patent documents that store the bibliographic information converted into the common data format among the patent documents collected by the information processing apparatus 15 in a predetermined period. To do.
  • the bibliographic information generation unit 221 of the bibliographic server 200 receives the bibliographic number and the bibliographic information converted into a common data format from the external database 18 via the second communication device 201.
  • the bibliographic information generation unit 221 refers to the plurality of patent documents collected from the first database 16 and the second database 17 by the information processing apparatus 15 in a predetermined period from the external database 18 with the document numbers and a common data format.
  • the bibliographic information converted into is acquired (step S104).
  • the bibliographic information generation unit 221 collects the patent documents themselves from the external database 18, the first database 16 or the second database 17, and extracts the bibliographic items described in each patent document for each collected patent document.
  • Bibliographic information may be generated. Further, since the format of the document number extracted from each patent document differs from country to country, the bibliographic information generation unit 221 converts the extracted document number into a common format in the management system 1.
  • FIG. 16 is a schematic diagram showing an example of the data structure of bibliographic information.
  • the bibliographic information includes the issuing institution, document number, document type, application number, issue date, filing date, FI, theme code, F-term, IPC, applicant name, inventor name, renewal date, etc. Is included.
  • the bibliographic information includes a data type and one or more records for each of the above items.
  • the bibliographic information generation unit 221 acquires bibliographic information from the external database 18 for the patent document related to the family application, and stores the bibliographic information of the patent document related to the family application in one table. And store them in association with each other.
  • the bibliographic information generation unit 221 sets a record of bibliographic information for each patent document related to a family application in one table.
  • the information processing apparatus 15 transmits the document numbers and the content information stored in the data group 151 to the AI server 400 for the plurality of patent documents collected from the first database 16 and the second database 17 in a predetermined period. Step S105).
  • the information processing device 15 voluntarily transmits the document number and the content information to the inquiry server 100.
  • the information processing device 15 may transmit the document number and the content information to the AI server 400 in accordance with the request from the AI server 400.
  • the search information generation unit 421 of the AI server 400 receives the document number and the content information from the information processing device 15 via the fourth communication device 401. As a result, the search information generation unit 421 acquires the document number and the content information from the data group 151 for the plurality of patent documents collected by the information processing apparatus 15 from the first database 16 and the second database 17 in a predetermined period. Next, the search information generation unit 421 executes a search information generation process for each patent document collected by the information processing apparatus 15 in a predetermined period (step S106). In the search information generation process, the search information generation unit 421 generates search information for each patent document based on the content information of each patent document.
  • the search information generation unit 421 uses the learning model to generate classification information, keywords, metadata, a plurality of first feature vectors, and the like of each patent document as search information.
  • the search information generation unit 421 collects the patent documents themselves from the information processing device 15, extracts bibliographic items from the collected patent documents, generates bibliographic information, and further searches the search information based on the generated bibliographic information. It may be generated. The details of the search information generation process will be described later.
  • Each process of steps S101 to S106 is started at the first timing, which is the start time of a certain period in which the operation sequence related to the update process is executed, and is executed in parallel.
  • the search information generation unit 421 generates the search information while the management server 300 is generating the first integrated information.
  • the AI server 400 and the management server 300 are independent of each other, and the search information generation unit 421 generates search information in parallel with the process in which the first integrated information generation unit 321 of the management server 300 generates the first integrated information. To do.
  • the generation of the search information by the search information generation unit 421 may be completed before the generation of the first integrated information by the first integrated information generation unit 321 starts, or the generation of the search information by the search information generation unit 421 may start.
  • the generation of the first integrated information by the first integrated information generation unit 321 may be completed before the above.
  • the first integrated information generation unit 321 of the management server 300 transmits a format information request for requesting acquisition of format information to the inquiry server 100 via the third communication device 301 (step S107).
  • the format information generation unit 121 of the inquiry server 100 receives the format information request from the management server 300 via the first communication device 101, the format information generation unit 121 obtains the format information in step S102 via the first communication device 101. It is transmitted to 300 (step S108).
  • the first integrated information generation unit 321 of the management server 300 transmits a bibliographic information request for requesting acquisition of bibliographic information to the bibliographic server 200 via the third communication device 301 (step S109).
  • the bibliographic information generation unit 221 of the bibliographic server 200 receives the bibliographic information request from the management server 300 via the second communication device 201, the bibliographic information generated in step S104 is transferred to the management server via the second communication device 201. It is transmitted to 300 (step S110).
  • steps S107 and S109 are executed at the second timing after the first timing within a certain period during which the operation sequence related to the update process is executed.
  • the second timing is set to a timing after at least a sufficient period (for example, two days) for completing the formal information and the acquisition of the formal information after the first timing.
  • the first integrated information generation unit 321 of the management server 300 receives the format information from the inquiry server 100 via the third communication device 301 and receives the bibliographic information from the bibliographic server 200, the received format information and bibliographic information are transmitted.
  • the integrated first integrated information is generated (step S111).
  • the first integrated information generation unit 321 generates the first integrated information for each patent document for which the inquiry server 100 has acquired the format information.
  • the management server 300 stores in the third storage device 310 a first table in which data items of format information and bibliographic information are arranged in a predetermined order for each patent document.
  • the first integrated information generation unit 321 stores each data included in the format information received from the inquiry server 100 for each patent document at a corresponding position in the first table. Further, when the bibliographic server 200 acquires the bibliographic information from the external database 18 for each patent document, the first integrated information generation unit 321 uses the bibliographic number received from the bibliographic server 200 as a key from the bibliographic server 200. Each data included in the received bibliographic information is stored in the corresponding position in the first table.
  • the first integrated information generation unit 321 sets a blank at a position corresponding to the bibliographic information in the first table. As a result, the first integrated information generation unit 321 generates the first integrated information for each patent document for which the inquiry server 100 has acquired the format information.
  • the first integrated information generation unit 321 may output information on patent documents for which the bibliographic server 200 has not acquired bibliographic information from the external database 18 among the patent documents for which the inquiry server 100 has acquired format information. ..
  • the first integrated information generation unit 321 outputs, for example, the document number of the patent document as information on the patent document for which the bibliographic server 200 has not acquired the bibliographic information.
  • the first integrated information generation unit 321 outputs information about a patent document for which the bibliographic server 200 has not acquired the bibliographic information by transmitting the information regarding the patent document to the log management server 14 via the third communication device 301.
  • the first integrated information generation unit 321 may output information on patent documents for which the bibliographic server 200 has not acquired bibliographic information by displaying it on a display device (not shown).
  • the administrator of the management system 1 can identify the patent documents in which the bibliographic information converted into the common data format is not stored in the first integrated information, the second integrated information, or the third integrated information. , Each piece of information can be updated individually for such patent documents.
  • FIG. 17 is a schematic diagram showing an example of the data structure of the first integrated information (first table).
  • the first integrated information includes the issuing inventor, the document number, the document type, the storage address, the file name, the language, the update date (of the format information), the application number, the issue date, the application date, and the FI. Includes theme code, F-term, IPC, applicant name, inventor name and update date (of bibliographic information).
  • the first integrated information includes a data type and one or more records for each of the above items. In this way, the data items of the format information and the bibliographic information are arranged in a predetermined order in the first table.
  • the first integrated information generation unit 321 generates the first integrated information by integrating (merging) the combination of the format information and the bibliographic information having the same document number among the received format information and bibliographic information.
  • the first integrated information generation unit 321 includes the issuing inventor, the document number, the document type, the storage address, the file name, the language, the update date included in the format information, and the update date, the application number, the issue date, and the application included in the bibliographic information. Generate first integrated information including date, FI, theme code, F-term, IPC, applicant name, inventor name and renewal date.
  • the first integrated information generation unit 321 may extract the issuing institution, the document number, and the document type from the bibliographic information instead of the format information.
  • the first integrated information generation unit 321 stores the first integrated information generated from each patent document related to the family application in association with each other in one first table.
  • the first integrated information generation unit 321 sets a record of the first integrated information for each patent document related to the family application in one first table.
  • the search information generation unit 421 of the AI server 400 stores each search information generated in the search information generation process in step S106 in the fourth storage device 410 (step S112).
  • FIG. 18 is a schematic diagram showing an example of the data structure of the search information.
  • the search information includes a reference number, a theme code, an F-term, an FI, a combination of one or more machine translations and translation methods, drawing metadata, and a plurality of orders. 1 Feature vector and the like are included. Further, although not shown, the search information may further include keywords and the like of each patent document. As shown in FIG. 18, the search information includes a data type and one or more records for each of the above items. Since the format of the document numbers included in each patent document differs from country to country, the search information generation unit 421 converts the document numbers included in each patent document into a common format in the management system 1 and searches. Store in information.
  • the search information generation unit 421 acquires content information from the data group 151 for the patent document related to the family application, generates search information, and generates search information, and the patent document related to the family application.
  • the search information of is stored in one table in association with each other.
  • the search information generation unit 421 sets search information for each patent document related to a family application in one table.
  • the second integrated information generation unit 322 of the management server 300 transmits a search information request for requesting acquisition of the search information to the AI server 400 via the third communication device 301 (step S113).
  • step S112 When the search information generation unit 421 of the AI server 400 receives the search information request from the management server 300 via the fourth communication device 401, the search information stored in step S112 is transferred to the management server via the fourth communication device 401. It is transmitted to 300 (step S114).
  • step S113 is executed at the third timing after the second timing within a certain period during which the operation sequence related to the update process is executed.
  • the third timing is set to a timing after at least a sufficient period (for example, 3.5 days) for the search information generation unit 421 to complete the generation of the search information after the first timing.
  • the second integrated information generation unit 322 of the management server 300 receives the search information from the AI server 400 via the third communication device 301, the first integrated information generated in step S111 and the received search information are integrated.
  • the second integrated information is generated (step S115).
  • the second integrated information generation unit 322 generates the second integrated information for each patent document for which the inquiry server 100 has acquired the format information. This second integrated information is used for registration in the search database 600.
  • the management server 300 stores in the third storage device 310 a second table in which each data item of the first integrated information and the search information is arranged in a predetermined order for each patent document.
  • the second integrated information generation unit 322 uses the document number as a key for each patent document, and sets each data included in the first integrated information generated by the first integrated information generation unit 321 and the search information received from the AI server 400. Each included data is stored in the corresponding position in the second table. As a result, the second integrated information generation unit 322 generates the second integrated information for each patent document for which the inquiry server 100 has acquired the format information.
  • FIG. 19 is a schematic diagram showing an example of the data structure of the second integrated information (second table).
  • the second integrated information in addition to each information included in the first integrated information, the theme code of the search information, the F-term, the FI, the machine translation, the translation method, the metadata of the drawing, and A plurality of first feature vectors and the like are included.
  • the second integrated information includes a data type and one or more records for each of the above items.
  • the data items of the first integrated information and the search information are arranged in a predetermined order.
  • the second integrated information generation unit 322 integrates (merges) the combination of the first integrated information and the search information having the same document number among the generated first integrated information and the received search information, thereby performing the second integration. Generate information.
  • the second integrated information generation unit 322 generates the second integrated information including each information included in the first integrated information and each information included in the search information.
  • the second integrated information generation unit 322 stores the second integrated information generated from each patent document related to the family application in association with each other in one second table.
  • the second integrated information generation unit 322 sets a record of the second integrated information for each patent document related to the family application in one second table.
  • the second integrated information generation unit 322 generates the second integrated information for each patent document after the first integrated information relating to each patent document for which the inquiry server 100 has acquired the format information is completed. That is, the second integrated information generation unit 322 does not start generating the second integrated information until the inquiry server 100 completes the generation of the first integrated information related to each patent document for which the format information has been acquired. As a result, the second integrated information generation unit 322 can efficiently generate the second integrated information.
  • the second integrated information transmission unit 323 transmits the second integrated information to the inquiry server 100 via the third communication device 301 (step S116).
  • the third integrated information generation unit 122 of the inquiry server 100 receives the second integrated information from the management server 300 via the first communication device 101, the received second integrated information and the content information included in each patent document are included.
  • a third integrated information is generated by integrating with the text data of (step S117).
  • the third integrated information generation unit 122 extracts text data from the content information received in step S101 for each patent document for which the inquiry server 100 has acquired the format information.
  • the text data is an example of the data included in the content information.
  • the third integrated information generation unit 122 converts the extracted text data into a common format in the management system 1.
  • the inquiry server 100 stores in the first storage device 110 a third table in which each data item of the second integrated information and the content information is arranged in a predetermined order for each patent document.
  • the third integrated information generation unit 122 uses the document number as a key to display each data included in the second integrated information received from the management server 300 and the text data extracted from the content information in a third table for each patent document. Store in the corresponding position of. As a result, the third integrated information generation unit 122 generates the third integrated information for each patent document for which the inquiry server 100 has acquired the format information.
  • FIG. 20 is a schematic diagram showing an example of the data structure of the third integrated information (third table).
  • the third integrated information includes text data, secondary data, management information, and the like in addition to each information included in the second integrated information.
  • the third integrated information includes a data type and one or more records for each of the above items. In this way, in the second table, each data item of the second integrated information and the content information is arranged in a predetermined order. Further, the data structure of the third integrated information is the same as the data structure of the management table 511 shown in FIGS. 10 to 13.
  • the third integrated information generation unit 122 integrates (merges) each of the received second integrated information and the combination of the second integrated information and the text data having the same document number among the extracted text data.
  • the third integrated information generation unit 122 stores the third integrated information generated from each patent document related to the family application in association with each other in one third table.
  • the third integrated information generation unit 122 sets a record of the third integrated information for each patent document related to the family application in one third table.
  • the third integrated information generation unit 122 assigns a family ID to the family application related to each patent document, and effectively sets a representative document flag of a specific patent document in the patent documents corresponding to the family application to enable the family.
  • the ID and the representative document flag are stored as secondary data.
  • the third integrated information generation unit 122 sets the latest update date of the patent document as the update date. Further, the third integrated information generation unit 122 sets the address in which the text file of each patent document is stored in the own server as the storage address. Further, the third integrated information generation unit 122 sets the identification information of the search server 500 to which the third integrated information is transmitted in the search server name, and sets the identification information of the own server in the inquiry server name. Then, the third integrated information generation unit 122 stores the update date, the storage address, the search server name, and the inquiry server name as management information.
  • the third integrated information generation unit 122 generates the third integrated information for each patent document after the second integrated information related to each patent document for which the inquiry server 100 has acquired the format information is completed. That is, the third integrated information generation unit 122 does not start generating the third integrated information until the inquiry server 100 completes the generation of the second integrated information related to each patent document for which the format information has been acquired. As a result, the third integrated information generation unit 122 can efficiently generate the third integrated information.
  • the third integrated information transmission unit 123 transmits the third integrated information to the search server 500 via the first communication device 101 so as to collectively register the third integrated information in the search database 600 (step S118).
  • the third integrated information storage control unit 521 of the search server 500 receives the third integrated information from the inquiry server 100 via the fifth communication device 501, the third integrated information storage control unit 521 collectively collects the third integrated information transmitted from the inquiry server 100. It is stored in the management table 511 (step S119). As a result, the third integrated information storage control unit 521 collectively registers the third integrated information in the search database 600.
  • the third integrated information storage control unit 521 stores each item included in the third integrated information at the position of the corresponding item in the management table 511. As described above, the data structure of the third integrated information is the same as the data structure of the management table 511.
  • the third integrated information storage control unit 521 can easily update the management table 511 by simply adding it to the management table 511 without processing the third integrated information. Therefore, the third integrated information storage control unit 521 can reduce the processing load of the update processing of the management table 511 and reduce the processing time.
  • the third integrated information storage control unit 521 sets an index for each record (data item) of the third integrated information.
  • the third integrated information storage control unit 521 sets the family application as an index of data items of each information included in the third integrated information generated from the patent document related to the family application.
  • a common index is set in the relevant patent documents.
  • the third integrated information storage control unit 521 uses the index set for the patent document in which the representative document flag is effectively set as a common index in the patent documents related to the family application.
  • the search server 500 can perform the search in a short time by using the index when searching the patent documents. As a result, the operation sequence related to the update process is completed.
  • the inquiry server 100 registers the third integrated information including the second integrated information in the search database 600.
  • the management server 300 may generate the third integrated information and register it in the search database 600.
  • the management server 300 may register the second integrated information in the search database 600, and the inquiry server 100 or the management server 300 may register the content information in the search database 600.
  • the search information generation process has a longer time than the acquisition of format information and the acquisition of bibliographic information. Therefore, the search information generation process may be started before the first timing, which is the start time of the acquisition of the format information and the acquisition of the bibliographic information. In that case, the search information generation process may be completed before the first integrated information is completed or before the generation of the first integrated information is started.
  • the management server 300 immediately transmits the search request information to the AI server 400 and generates it before the first integrated information is completed. Only the searched information may be acquired from the AI server 400. In that case, the management server 300 displays each data included in the first integrated information after the first integrated information is completed and each data included in the search information generated before the first integrated information is completed. Store in the corresponding position in the second table. As a result, the management server 300 generates the second integrated information that integrates the first integrated information and the search information generated before the first integrated information is completed. The management server 300 generates the second integrated information during the period when the update process is executed next for the search information that was not generated when the first integrated information is completed.
  • FIG. 21 is a flowchart showing an example of the operation of the search information generation process in the AI server 400.
  • the search information generation process shown in FIG. 21 is executed in step S106 of the update process shown in FIG.
  • the following steps S201 to S206 are executed for each patent document.
  • the search information generation unit 421 uses AI technology to generate translated texts obtained by translating the bibliographic information and content information of each patent document as search information (step S201).
  • the search information generation unit 421 determines whether or not each patent document, that is, the content of the invention shown in the content information of the target patent document for which the search information is generated is described in a language different from the language used by the target patent office. To do.
  • the search information generation unit 421 is a machine translation engine for translating the content of the invention described in that language into the language used. Is used to generate a translation of the content of the invention in the language used.
  • the translated text is an example of translated data.
  • the language used is any language such as Japanese, English, German, French, Chinese, and Korean, and any language different from the language in which the patent documents are described. It may be a language. Any translation engine may be used as such a machine translation engine.
  • the AI server 400 may independently generate the first learning model 411 by pre-learning using the learning patent documents described in various languages by using the known AI technology.
  • the search information generation unit 421 transmits a creation request signal requesting the creation of a translated text that translates the bibliographic information and the content information of each patent document to another server, and receives the translated text from the other server. You may get it. Further, the search information generation unit 421 may acquire the translated text created by an external translator by inputting it from an interface device (not shown) according to an interface standard such as USB (Universal Serial Bus).
  • USB Universal Serial Bus
  • the search information generation unit 421 identifies a major classification of the technical field of the patent document based on the content information of each patent document, and generates the specified major classification as classification information (step S202).
  • the search information generation unit 421 may specify a major classification of the technical field of the patent document based on the content information and the bibliographic information of each patent document.
  • the search information generation unit 421 determines whether or not each patent document defines a subclass of technical fields to be assigned by the target patent office. When the sub-classification is not specified, the search information generation unit 421 first identifies the major classification of the technical field by the following four methods.
  • the search information generation unit 421 largely classifies the technical fields specified by the target patent office based on the classification of the technical fields specified by the patent offices other than the target patent office included in each patent document. To identify.
  • the patent office that issued each patent document is the United States Patent and Trademark Office, the European Patent Office, etc., IPC, CPC, etc. can be used as the classification of the technical fields included in each patent document.
  • the JPO that issued each patent document is the Japan Patent Office
  • the theme code, FI, F-term, etc. assigned by the JPO can be used as the classification of the technical field specified in each patent document. is there.
  • the AI server 400 is set in advance in the fifth storage device 510 in association with the major classifications of the technical fields specified by the target patent office for each classification of the technical fields specified by the patent offices of each country.
  • the search information generation unit 421 specifies the classification of the technical field specified by the JPO of any country from each patent document, and is specified by the target patent office set in association with the classification of the specified technical field. Identify the major categories of technical fields.
  • the search information generation unit 421 identifies a major classification of technical fields based on statistical data of the correspondence between classifications of technical fields between family applications.
  • the AI server 400 generates statistical data in advance based on a combination of patent documents related to family applications filed in the past.
  • the AI server 400 selects the patent documents related to the family application to the target patent office among the family applications of the patent documents to which the classification is assigned for each classification of the technical field specified by the patent offices of each country other than the target patent office. Extract.
  • the AI server 400 is assigned a number or ratio of each classification specified by the target patent office in each extracted patent document for each classification of the technical field specified by each country's patent office other than the target patent office. Is calculated and stored as statistical data.
  • the search information generation unit 421 specifies the classification of the technical field specified by the patent office of that country from the patent document. .. Then, the search information generation unit 421 identifies the major classification of the classification in which the statistical data stored in association with the classification of the specified technical field is equal to or greater than the threshold value as the major classification of the technical field of the acquired patent document.
  • the AI server 400 statistically statistics the number or ratio of specific FI or F-terms assigned in the family application of the application to which specific IPC or CPC or the like is assigned. Calculate as data.
  • the search information generation unit 421 identifies the IPC or CPC assigned in each patent document, and sets the theme code of the FI or F term in which the statistical data associated with the specified IPC or CPC is equal to or greater than the threshold value in each patent document. It is specified as a major classification of the technical fields of.
  • the search information generation unit 421 uses a concept search as a third method to identify a major classification of technical fields.
  • the search information generation unit 421 extracts a predetermined number of patent documents filed with the target patent office, which are similar to the acquired patent documents, by using the concept search described later.
  • concept search for example, patent documents in which the frequency of appearance of each term included in the content information of each patent document is similar are extracted.
  • the search information generation unit 421 specifies, in each extracted patent document, a major classification of technical fields in which the assigned number or ratio is equal to or greater than a threshold value as a major classification of technical fields of each patent document.
  • search information generation unit 421 specifies a major classification by machine learning (SVM, etc.) based on the content information as a fourth method.
  • SVM machine learning
  • the search information generation unit 421 specifies all the major classifications specified by the first to fourth methods as the major classifications of the technical fields of each patent document. In addition, the search information generation unit 421 may specify a major classification of the technical field of each patent document by only one or two methods among the first to fourth methods.
  • the search information generation unit 421 identifies a sub-classification of the technical field of each patent document using AI technology, and generates the specified sub-classification as classification information (step S203).
  • the search information generation unit 421 identifies the minor classification of the technical field of each patent document by using the second learning model 412 corresponding to the major classification specified in step S202.
  • Each second learning model 412 relates to a technical field of the patent document when information on the content information of the patent document is input by using a plurality of learning patent documents described in the language used by the target patent office. Pre-learned to output information. In particular, each second learning model 412 is pre-learned to output information on subclassification of the technical field of the patent document when the feature amount calculated from the content information of each patent document is input. ..
  • each second learning model 412 is generated for each sub-classification and is trained using SVM (Support Vector Machine).
  • SVM Serial Vector Machine
  • Each second learning model 412 outputs +1 when the feature amount calculated from the content information of each patent document is input, and if the patent document matches the corresponding subclass, and matches. If not, it is learned to output -1. That is, each second learning model 412 discriminates between a feature amount calculated from a patent document that matches the corresponding subclassification and a feature amount calculated from a patent document that does not match the corresponding subclassification. Includes identification plane.
  • Each second learning model 412 outputs +1 when the input feature amount is located on the side corresponding to the corresponding subclassification with respect to the identification plane, and matches the corresponding subclassification.
  • each second learning model 412 when the input feature amount is located on the side that matches the corresponding subclass with respect to the identification plane, the distance (margin) of the feature amount from the identification plane. ) May be output as a normalized score value. Parameter adjustment and threshold adjustment are performed by a known adjustment method.
  • TF-IDF Term Frequency Inverse Document Frequency
  • the AI server 400 uses morphological analysis technology to decompose the content information in each learning patent document for each word (morpheme), calculates the frequency of appearance of each word and the frequency of reverse documents, and obtains TF-IDF. calculate.
  • the reverse document frequency is calculated from patent documents having the same theme code. Further, in order to reduce the number of dimensions of the frequency of appearance of each word, morphemes below the lower limit threshold value or above the upper limit threshold value may be removed. Further, the AI server 400 may decompose the document after removing each term corresponding to a specific format in the patent document such as "technical field" or "background technology” enclosed in brackets.
  • a feature amount other than TF-IDF such as Bag of Words may be used.
  • classification information of technical fields such as each morpheme (word), sentence, paragraph or distributed expression of literature, text and / or IPC may be used.
  • feature amount information regarding the number or distribution of feature points such as corners or intersections of an object extracted from an image in a drawing included in the content information of each patent document may be used.
  • the second learning model 412 uses other known machine learning techniques such as logistic regression, MLP (Multilayer Perceptron), RNN (Recurrent Neural Network), CNN (Convolutional Neural Network), and NAM (Neural Attention Model). It may be learned. Further, the second learning model 412 may be learned by combining a plurality of machine learning techniques by using a method such as ensemble learning. In that case, the parameter itself that combines a plurality of machine learnings may be obtained by machine learning.
  • MLP Multilayer Perceptron
  • RNN Recurrent Neural Network
  • CNN Convolutional Neural Network
  • NAM Neuro Attention Model
  • the search information generation unit 421 calculates the feature amount from each patent document in the same manner as the pre-learning process by the AI server 400.
  • the search information generation unit 421 decomposes the content information in the patent document into words by using the morphological analysis technique, and calculates the appearance frequency and the inverse document frequency of each word. TF-IDF is calculated.
  • the search information generation unit 421 inputs the calculated feature amount into each second learning model 412 corresponding to each minor classification belonging to the major classification specified in step S203, and inputs the output value from each second learning model 412. get.
  • the search information generation unit 421 identifies the sub-classification corresponding to the second learning model 412 whose output value is equal to or greater than the threshold value as the sub-classification of the technical field of the acquired patent document, and searches for the sub-classification of the specified technical field. Generate as.
  • the search information generation unit 421 calculates the feature amount from the translated text in the language used for the patent document.
  • This feature quantity is an example of information on the translated text in the language used in the patent document.
  • the search information generation unit 421 acquires information on the technical field of the patent document by inputting the feature amount calculated from the translated text in the language used in the patent document into the second learning model 412.
  • the management system 1 can appropriately assign technical fields to patent documents in various languages regardless of the type of language in which each patent document is described.
  • search information generation unit 421 inputs into each second learning model 412 even a patent document described in the language used by the target patent office and to which a subclass of the technical field is not given, and the patent document is small.
  • the classification may be specified.
  • the search information generation unit 421 may specify a word (morpheme), a sentence, a paragraph, or the like that is the basis for assigning the classification. For example, when the second learning model 412 is trained using TF-IDF or the like as a feature quantity using SVM or logistic regression, the weight of each element in the feature quantity is determined at the time of learning.
  • the search information generation unit 421 calculates a multiplication value obtained by multiplying each element in the feature amount calculated from the patent document by a weight defined for each element, and the word corresponding to the element whose multiplication value is equal to or more than a predetermined threshold value. Is specified as the word on which the classification is based.
  • the search information generation unit 421 calculates the total value of the multiplication values related to the words included in each sentence or paragraph for each sentence or paragraph in the patent document, and corresponds to the element whose total value is equal to or greater than a predetermined threshold value.
  • a sentence or paragraph may be specified as the sentence or paragraph on which the classification is based.
  • the search information generation unit 421 notifies the inquiry server 100 of the information indicating the specified word, sentence or paragraph in association with the information indicating each patent document and technical field.
  • the inquiry server 100 receives the inquiry request signal from the terminal device 10
  • the inquiry server 100 includes the text data and image data of the patent document corresponding to the designated document number, as well as the technical field, word, and the technical field specified by the search information generation unit 421.
  • the sentence or paragraph is transmitted to the terminal device 10.
  • the user of the terminal device 10 can examine the validity of the specified technical field from the words, sentences or paragraphs on which the technical field is specified, and modify the technical field as necessary. can do.
  • the search information generation unit 421 may calculate information on the sub-classification of the technical field for each block in the patent document, and specify the sub-classification of the technical field based on the information calculated for each paragraph.
  • a block is a sentence, paragraph, or the like.
  • the second learning model 412 is trained using SVM using Bag of Words or the like generated from each block included in the learning patent document as a feature amount, which is generated for each subclassification.
  • Each second learning model 412 determines the distance (margin) of the input feature quantity from the identification plane when the input feature quantity is located on the side corresponding to the corresponding subclass. Learned to output a normalized score value.
  • a block not related to the subclass may be used as a learning sample not corresponding to the specific subclass.
  • the search information generation unit 421 inputs the feature amount calculated from each block included in the patent document into the corresponding second learning model 412, and acquires the score value output from each second learning model 412.
  • the search information generation unit 421 classifies the subclass corresponding to the second learning model 412 into the subclasses in the technical field of the patent document. Identify as.
  • the search information generation unit 421 sets a second threshold value based on the maximum value of the output value according to the patent document, and when there are a predetermined number or more blocks having a score value equal to or higher than the second threshold value, the second learning model.
  • the subclass corresponding to 412 may be specified as a subclass in the technical field of the patent document.
  • the second threshold value is set to, for example, a predetermined multiple (for example, 0.5 times) of the maximum value.
  • the search information generation unit 421 can specify the technical field of the patent document with higher accuracy.
  • TF-IDF each morpheme (word), or a distributed expression of a sentence, a paragraph, or a document
  • the second learning model 412 may be learned using other known machine learning techniques such as logistic regression, MLP, RNN, CNN, NAM, or in combination of a plurality of machine learning techniques.
  • the search information generation unit 421 may have a different threshold value for each block included in the patent document.
  • the search information generation unit 421 makes the threshold value corresponding to the block included in the claims or the outline of the invention smaller than the threshold value corresponding to other blocks.
  • the search information generation unit 421 can specify the technical field of the patent document by giving priority to the terms included in the claims or the outline of the invention.
  • the search information generation unit 421 may specify a subclass of the technical field based on the score value calculated from the entire content information of the patent document and the score value calculated from each block in the patent document. Good. In that case, the search information generation unit 421 inputs the feature amount calculated from the entire content information of the patent document into the learning model learned using the entire content information of the patent document for learning, and acquires the first score value. To do. Further, the search information generation unit 421 inputs the feature amount calculated from each block in the patent document into the learning model learned using each block in the learning patent document to acquire the second score value. ..
  • the AI server 400 uses a learning model that has been pre-trained so as to output whether or not the patent document matches the corresponding subclass when the first score value and the second score value are input. It is stored in the fourth storage device 410 in advance.
  • the search information generation unit 421 inputs the acquired first score value and second score value into the learning model, and determines whether or not the patent document matches the corresponding subclassification.
  • the search information generation unit 421 determines whether or not the patent document matches the corresponding subclass depending on whether or not the sum of the first score value and the second score value or the weighted sum is equal to or greater than a predetermined value. May be determined. As a result, the search information generation unit 421 can specify the technical field of the patent document with higher accuracy.
  • the second learning model 412 may be pre-learned so as to output information indicating a subclass that best matches the patent document when the feature amount calculated from each patent document is input.
  • the second learning model 412 is learned by, for example, deep learning, using a plurality of learning patent documents.
  • the learning model which is a neural network, has a multi-layer structure composed of an input layer, an intermediate layer, and an output layer. Each element such as TF-IDF calculated from each patent document is input to each node of the input layer as a feature amount. A weighted sum based on a predetermined weight of the values output from each node of the input layer is input to each node of the intermediate layer.
  • a weighted sum based on a predetermined weight of the values output from each node of the intermediate layer is input to each node of the output layer.
  • the output layer for example, outputs the input value as it is.
  • Each weight is set so that the difference between the value output by the output layer and the value indicating the subclass of the technical field assigned to the learning patent document is small.
  • a known method such as an error back propagation method is used.
  • the search information generation unit 421 inputs the calculated feature amount to each second learning model 412, and classifies the subclass corresponding to the output value from each second learning model 412 as a subclass of the technical field of the acquired patent document. Identify.
  • the search information generation unit 421 searches the technical field or translation data of each patent document from each patent document based on the bibliographic information or content information of each patent document using the learning model. Generate as information.
  • Each database has a large number of patent documents to which technical fields have been manually assigned, and the search information generation unit 421 identifies the technical fields of each patent document with high accuracy by using a large number of teacher data. be able to.
  • the search information generation unit 421 generates a translated sentence in which the content of the invention is translated into the language used for the patent document in which the content of the invention is not described in the language used by the target patent office.
  • the search information generation unit 421 is based on the content of the invention for the patent document in which the content of the invention is described in the language used, and the content of the invention for the patent document in which the content of the invention is not described in the language used.
  • the classification information of each patent document is generated using the learning model. That is, the search information generation unit 421 translates the patent document in a foreign language into the language corresponding to the learning model, and then identifies the sub-classification of the technical field.
  • the AI server 400 does not need to prepare a learning model for each of a plurality of languages, reduces the amount of work of the person in charge required for pre-learning, and reduces the storage capacity of the fifth storage device 510. it can. Further, the AI server 400 only needs to generate a learning model focusing on a specific language, and can generate a learning model with sufficient time and effort, and as a result, the accuracy of the learning model can be improved. It will be possible. In particular, in recent years, the translation technology of patent documents using AI technology has been improved by learning using the patent document pair related to the family application, and the search information generation unit 421 acquires a high-quality translated sentence. it can. Therefore, the search information generation unit 421 can identify the sub-classification of the technical field with high accuracy by using the high-quality translated text.
  • the AI server 400 stores in the fourth storage device 410 a learning model pre-learned using the learning patent documents described in each language for each of a plurality of languages other than the language used by the target patent office. You may. In that case, the AI server 400 may use a translated text translated from the patent document described in the language used as the learning patent document written in a language other than the language used.
  • the learning model may also be trained using the subclasses of the technical field assigned to the family application of the learning patent document.
  • the search information generation unit 421 identifies the sub-classification of the technical field by using the learning model corresponding to the language in which the patent document is described, without translating the acquired patent document. As a result, the search information generation unit 421 can identify the sub-classification of the technical field in a short time, shorten the processing time related to the search information generation processing, and reduce the processing load.
  • the search information generation unit 421 identifies the major classification of the technical field of each patent document, and then specifies the minor classification of the technical field by using the learning model corresponding to the major classification. As a result, the search information generation unit 421 does not need to apply each patent document to the learning model corresponding to all the subclasses, shortens the processing time related to the search information generation processing, and reduces the processing load. Can be done.
  • the search information generation unit 421 may specify the minor classification of the technical field from the content information of each patent document without specifying the major classification of the technical field of each patent document.
  • the search information generation unit 421 generates a second feature vector from the content information of each patent document (step S204).
  • the second feature vector for example, TF-IDF calculated from the content information is used.
  • a feature vector other than TF-IDF such as Bag of Words or BM25 may be used.
  • a distributed representation of each morpheme (word), sentence, paragraph or document may be used.
  • Word2Vec, Doc2Vec, SDCV (Sparse Composite Document Vectors) and the like are used.
  • the search information generation unit 421 generates a plurality of first feature vectors having different numbers of dimensions from the second feature vector for each patent document (step S205).
  • the search information generation unit 421 generates a feature vector having a hash value obtained by converting the second feature vector of each patent document as a first feature vector by using a plurality of different hash functions LSH (Locality-Sensitive Hashing). To do.
  • LSH Long-Sensitive Hashing
  • the hash function LSH is a function in which the hash value is set to be closer as the feature vector is closer.
  • Each hash function is defined by the following formula.
  • v is a second feature vector
  • a is an m-dimensional vector composed of the same number of random variables as the number of dimensions m of the second feature vector v, selected from the p-stable distribution using the Gaussian distribution as an example.
  • b is a real number uniformly and randomly selected from the flat space [0, W] (W> 0).
  • the feature space of the second feature vector v is divided by h (v) by a hyperplane at equal intervals orthogonal to the vector a.
  • the first feature vector g k (v) is defined by the following equation.
  • g k (v) (h 1 (v), h 2 (v) ... h k (v))
  • h 1 (v) to h k (v) are hash values of the hash function h (v) defined by different a and b, which are randomly set.
  • the first feature vector g k (v) is a k-dimensional feature vector, and is a subspace (bucket) in which the feature space of the second feature vector v is divided by k hyperplane sets at equal intervals. Among them, it represents a subspace (bucket) corresponding to the patent document.
  • the search information generation unit 421 sets an equation of two or more predetermined numbers of first feature vectors g k (v) having different dimension numbers k.
  • the search information generation unit 421 has, for example, nine first feature vectors g 1 (v), g 2 (v), in which the number of dimensions k is 1, 2, 4, 8, 16, 32, 64, 128, 256.
  • the search information generation unit 421 generates a plurality of first feature vectors having different numbers of dimensions by substituting the second feature vector v into each equation of the set predetermined number of first feature vectors g k (v). ..
  • the number of dimensions k of the first feature vector g k (v) corresponds to the number of divisions that divide the feature space of the second feature vector into each subspace. The larger the number of dimensions k, the larger the number of divisions.
  • the amount of information represented by the first feature vector g k (v) becomes large.
  • the search information generation unit 421 determines whether or not the processing of all the patent documents collected by the information processing apparatus 15 in a predetermined period has been completed (step S206). When there is a patent document whose processing has not been completed yet, the search information generation unit 421 returns the processing to step S201 and repeats the processing of steps S201 to S206. On the other hand, when the processing for all the patent documents is completed, the search information generation unit 421 ends a series of steps.
  • the AI server 400 is a pre-trained learning model that outputs the feature vector, keyword, or drawing metadata of each patent document when the text or drawing of the content information of each patent document is input. May have.
  • the learning model is trained using, for example, deep learning, and each weight is the difference between the value output by the output layer and the value indicating the feature vector, keyword, or metadata assigned to the training patent document. It is set to be small.
  • the search information generation unit 421 inputs the text or drawing of the content information of each patent document into each learning model, and inputs the feature vector, keyword, or metadata corresponding to the output value from each learning model to the feature vector of each patent document. , Keywords or metadata.
  • the search information generation unit 421 generates the determined feature vector, keyword, or metadata as search information.
  • the search information generation unit 421 may generate common search information for each patent document related to the family application.
  • the search information generation unit 421 uses the search information generated for the patent document for which the representative document flag is effectively set as the common search information in the patent documents related to the family application. That is, the search information generation unit 421 uses the search information including the classification information generated for the patent document related to the specific application as the search information for the patent document related to the family application of the specific application.
  • the search information generation unit 421 can set the sub-classification of the technical field in a shorter time, shorten the processing time related to the search information generation processing, and reduce the processing load.
  • FIG. 22 is a schematic diagram for explaining the execution timing of each process in the update process shown in FIG.
  • the update process is executed in a cycle of 7 days (1 week).
  • the format information acquisition process, the bibliographic information acquisition process, and the search information generation process are started at the same time on the first day, and the format information acquisition process and the bibliographic information acquisition process are completed on the third day, but the search information is generated.
  • the process is not completed until the 4th day.
  • the process of generating the first integrated information using the formal information and the bibliographic information is started on the third day and completed on the fourth day.
  • the process of generating the second integrated information using the first integrated information and the search information is started on the 5th day and completed on the 6th day.
  • the process of generating the third integrated information using the second integrated information and the content information is started on the 6th day and completed on the 7th day.
  • the management server 300 starts the first integrated information generation process without waiting for the completion of the search information, which requires a large amount of time to generate.
  • the management server 300 can generate the first integrated information by the time the search information is completed, and can efficiently generate the second integrated information in a short time by using the generated first integrated information and the search information. ..
  • the management server 300 does not generate the first integrated information each time the format information and bibliographic information of each patent document are acquired, but the format information and bibliography of a plurality of patent documents collected by the information processing apparatus 15 during a predetermined period.
  • the first integrated information is generated after the acquisition of the information is completed.
  • the management server 300 generates the second integrated information after the first integrated information and the search information of the plurality of patent documents collected by the information processing apparatus 15 are completed in a predetermined period, and transmits the second integrated information to the inquiry server 100.
  • the inquiry server 100 generates the third integrated information after the second integrated information of the plurality of patent documents collected by the information processing apparatus 15 is completed in a predetermined period, and transmits the third integrated information to the search server 500.
  • the overhead related to the generation process of the third integrated information by the inquiry server 100 and the update process of the management table in the search server 500, including the transmission / reception process of the third integrated information in each server, is reduced.
  • FIG. 23 shows an example of an operation sequence related to output processing by the management system 1.
  • the operation sequence described below is based on a program stored in advance in the storage device of each server or device of the management system 1, mainly by the control device of each server or device, and each element of each server or device. It is carried out in collaboration. This sequence of operations is executed on a regular basis.
  • the terminal device 10 transmits a search screen display data request signal for requesting acquisition of search screen display data for displaying a search screen for the user to search for patent documents to the UI server 12 (step). S301).
  • the UI server 12 When the UI server 12 receives the search screen display data request signal from the terminal device 10, the UI server 12 transmits the search screen display data to the terminal device 10 (step S302).
  • the search screen display data is generated by a known programming language such as HTML or Java® script.
  • the terminal device 10 When the terminal device 10 receives the search screen display data from the UI server 12, the terminal device 10 displays the search screen according to the search screen display data (step S303).
  • the terminal device 10 displays a search screen on a web browser or the like.
  • FIG. 24 is a schematic diagram showing an example of the search screen 2400.
  • the search screen 2400 shown in FIG. 24 is an example of a search screen when the target patent office is the Japan Patent Office.
  • the search screen 2400 includes a search designated area 2410 and a search result display area 2430.
  • the display data received from the UI server 12 does not include data for displaying the search result display area 2430, and the search result display area 2430 is not displayed on the search screen displayed in step S303. ..
  • the issue country designation box 2411 In the search designation area 2410, the issue country designation box 2411, the type selection box 2412, the examination target designation box 2413, the theme designation button 2414, the publicly known date designation box 2415, the search formula designation box 2416, the image designation box 2417, and the concept search selection button 2418, a machine translation text selection button 2419, an estimated classification selection button 2420, a search button 2421, and the like are included.
  • the issuing country designation box 2411 is a box for designating the country in which the patent document to be searched is issued. When designating a foreign country as the country that issued the patent document to be searched, the user can further specify that country.
  • the type selection box 2412 is a box for selecting whether the patent documents to be searched are all documents or only published documents.
  • the examination target designation box 2413 is a box for designating a patent document to be examined.
  • the theme designation button 2414 is a button for designating a theme code. When the theme specification button 2414 is pressed, a box for entering the theme code is displayed.
  • the publicly known date designation box 2415 is a box for designating the publicly known date of the patent document to be searched.
  • the search expression designation box 2416 is a box for inputting a search expression (keyword, FI and / or F-term).
  • the image designation box 2417 is a box for inputting an image.
  • the image in the drawing included in the patent document to be examined designated in the examination target designation box 2413 may be selectively displayed.
  • the concept search selection button 2418 is a button for selecting whether or not to execute the concept search.
  • the machine translation sentence selection button 2419 is a button for selecting whether or not to include the translation sentence generated by the AI server 400 in the search target.
  • the estimation classification selection button 2420 is a button for selecting whether or not to include the classification generated by the AI server 400 in the search target.
  • the search button 2421 is a button for executing a search under the conditions specified or selected in each of the above buttons and boxes.
  • the terminal device 10 transmits a search request signal for requesting execution of the search to the gateway server 11 (step S304).
  • the search request signal includes each condition (search query) specified on the search screen 2400, that is, designated data for search specified by the user.
  • the gateway server 11 When the gateway server 11 receives the search request signal from the terminal device 10, it transmits the search instruction signal for instructing the execution of the search to the plurality of search servers 500 (step S305).
  • the gateway server 11 converts the search query included in the search request signal into a format (SQL format or the like) that can be processed by the search server 500, and transmits the search instruction signal including the converted search query to each search server 500.
  • the gateway server 11 registers the search query included in the search request signal received from each terminal device 10 in the reception queue, and transmits the search instruction signal based on the processing status of the search server 500. As a result, the gateway server 11 functions as a load balancer and can level the load of the search server 500.
  • the search unit 522 of each search server 500 receives the search instruction signal from the gateway server 11 via the fifth communication device 501, the search unit 522 executes the search process according to the search query included in the search instruction signal (step S306).
  • the search unit 522 corresponds to the patent document satisfying the search query (condition) included in the search instruction signal, that is, the designated data specified by the user, from the third integrated information stored in the search database 600.
  • First display data for displaying a plurality of patent documents side by side is generated. The details of the search process will be described later.
  • the search unit 522 transmits the first display data generated in the search process to the gateway server 11 via the fifth communication device 501 (step S307).
  • the gateway server 11 When the gateway server 11 receives the first display data from each search server 500, the gateway server 11 integrates the first display data received from each search server 500 and transmits the first display data to the terminal device 10 (step S308).
  • the terminal device 10 When the terminal device 10 receives the integrated first display data from the gateway server 11, the terminal device 10 stores the received first display data and displays a plurality of patent documents side by side according to the first display data (step S309). ..
  • the terminal device 10 When the terminal device 10 receives the integrated search result from the gateway server 11, the terminal device 10 stores the received first display data and displays a plurality of patent documents side by side according to the first display data (step S309).
  • the search result display area 2430 is further displayed on the search screen 2400.
  • the search result display area 2430 includes a document number 2431 of each patent document, a theme code 2432, FI 2433, a publicly known date 2434, a title of the invention 2435, a check box 2436, and the like for each patent document shown in the search result. Further, the search result display area 2430 includes a scroll bar 2437 and an update button 2438.
  • each patent document is displayed in the order determined in the search process.
  • the check box 2436 is a button for designating a patent document of interest to the user.
  • the scroll bar 2437 is a bar for scrolling the search result display area 2430 so that the non-displayed patent documents can be displayed when the search result display area 2430 contains a number of patent documents that cannot be displayed at one time.
  • the update button 2438 is a button for displaying each patent document in a sorted manner based on the degree of similarity with the patent document specified by the check box 2436.
  • the terminal device 10 transmits an update request signal for requesting the rearrangement of patent documents to the gateway server 11 (step S310).
  • the update request signal includes the patent document designated by the check box 2436, that is, the information indicating the patent document designated by the user in the terminal device 10.
  • the update request signal may include information indicating a patent document continuously displayed in the search result display area 2430 for a predetermined time or longer.
  • the gateway server 11 When the gateway server 11 receives the update request signal from the terminal device 10, it transmits an update instruction signal for instructing the rearrangement of patent documents to a plurality of search servers 500 (step S305).
  • the gateway server 11 converts the information indicating the patent document included in the update request signal into a format that can be processed by the search server 500 (SQL format, etc.), and transmits an update instruction signal including the converted search query to each search server 500. To do.
  • the search unit 522 of each search server 500 receives the update instruction signal from the gateway server 11 via the fifth communication device 501, the search unit 522 executes the update process according to the information indicating the patent document included in the update instruction signal (step). S312).
  • the search unit 522 generates the second display data in which the patent documents displayed by the first display data are rearranged based on the degree of similarity with the patent documents specified in the update instruction signal. The details of the update process will be described later.
  • the search unit 522 transmits the second display data generated in the search process to the gateway server 11 via the fifth communication device 501 (step S313).
  • the gateway server 11 When the gateway server 11 receives the second display data from each search server 500, the gateway server 11 integrates the second display data received from each search server 500 and transmits the second display data to the terminal device 10 (step S314).
  • the update process may be executed by one search server 500. In that case, the gateway server 11 transmits the second display data from one search server 500 to the terminal device 10.
  • the terminal device 10 When the terminal device 10 receives the second display data from the gateway server 11, the terminal device 10 stores the received second display data and rearranges and displays the patent documents displayed by the first display data according to the second display data. (Step S315). As described above, the operation sequence related to the output processing is completed.
  • the search server 500 includes patent documents to be examined, theme codes (major classification of technical fields) and FIs and F-terms (small technical fields) included in the search query included in the received search instruction signal. Classification) and the like may be fed back to the AI server 400.
  • the search server 500 transmits the patent document to be examined, the major classification of the technical field, and the minor classification of the technical field to the AI server 400.
  • the AI server 400 updates the learning model for identifying the technical field by using the received patent document, the major classification of the technical field, and the minor classification of the technical field.
  • the management system 1 can continuously improve the accuracy of the technical field specified by the learning model.
  • FIG. 25 is a flowchart showing an example of the operation of the search process in the search server 500.
  • the search process shown in FIG. 25 is executed in step S306 of the output process shown in FIG. 23.
  • the search unit 522 determines whether or not it is specified to execute the concept search in the search query (condition) included in the received search instruction signal (step S401).
  • the search unit 522 refers to the third integrated information stored in the search database 600 and satisfies the search query (condition) included in the search instruction signal. (Step S402), and the process proceeds to step S408.
  • the search unit 522 sets the issuing country, publicly known date, theme code, FI, F-term, and / or each keyword specified in the search query from the patent documents stored in the search database 600 in the third integrated information. Extract patent documents that match each corresponding data item.
  • a plurality of pieces of information relating to one patent document are collectively stored as a third integrated information. Since the search server 500 can search a plurality of specified information at once even when the user specifies a plurality of information and performs a search, the search time of the search process can be shortened and the processing load can be reduced. it can.
  • the search unit 522 targets the translated text or classification generated by the AI server 400.
  • the search database 600 translations of patent documents written in a language different from the language used by the target patent office are stored in the language used.
  • the search unit 522 refers to the translated text in the language used for the patent document written in a language different from the language used. Do a search. Therefore, the user can efficiently search for patent documents written in various languages at once without being aware of the difference in language in each patent document, and the management system 1 is convenient for the user. Can be improved.
  • the search unit 522 may search the patent documents described in that language. As a result, the search unit 522 can execute the search with higher accuracy.
  • the search unit 522 uses a plurality of first elements for the keyword or image specified in the search query included in the search instruction signal, that is, the specified data specified by the user.
  • a feature vector is generated (step S403).
  • the search unit 522 generates the first feature vector of the keyword or image specified in the search query in the same manner as in steps S204 and S205 of the search information generation process shown in FIG.
  • the search unit 522 selects the first feature vector to be compared (step S404).
  • the search unit 522 selects the first feature vector having the largest number of dimensions among the plurality of first feature vectors as the first feature vector to be compared.
  • the search unit 522 selects the first feature vector having the second largest number of dimensions as the first feature vector selected last time as the first feature vector to be compared. ..
  • the search unit 522 refers to the third integrated information stored in the search database 600, and extracts patent documents that satisfy the search query (condition) included in the search instruction signal (step S405).
  • the search unit 522 uses the patent documents stored in the search database 600 to include the issuing country, publicly known date, theme code, each keyword, FI, F-term, and / or the first feature to be compared.
  • a patent document in which the vector matches each corresponding data item in the third integrated information is extracted.
  • an information processing device calculates the similarity between the two feature vectors (for example, cosine similarity) and determines whether or not the similarity is equal to or greater than a threshold value. To determine whether or not the two feature vectors correspond. Therefore, the information processing apparatus extracts the patent documents that match the country of issue, the publicly known date, the theme code, each keyword, the FI, and the F term, and then calculates the similarity of the feature vector for each extracted patent document and resembles them. It is necessary to identify patent documents whose degree is equal to or higher than the threshold value.
  • the similarity between the two feature vectors for example, cosine similarity
  • the first feature vector is a feature vector whose element is a hash value obtained by transforming the second feature vector of each patent document using LSH, and is a hyperplane set of the same number of dimensions of the second feature vector. Represents each subspace (bucket) that divides the feature space.
  • the search server 500 determines whether or not the two first feature vectors correspond, the search server 500 determines whether or not the subspaces represented by the two first feature vectors match, thereby determining whether or not the two first feature vectors correspond to each other. Can be determined with high accuracy whether or not corresponds to. That is, the search server 500 can determine with high accuracy whether or not the two first feature vectors correspond only by determining whether or not each element (hash value) of the first feature vector matches.
  • the search server 500 can handle the comparison between the first feature vectors in the same manner as the comparison between the character strings, and collates the issuing country, publicly known date, theme code, each keyword, FI, and F-term, and the first. Collation of feature vectors can be performed together. Therefore, the search server 500 can reduce the processing load of the search process and can search a large number of patent documents at high speed.
  • the search unit 522 calculates an evaluation value for each extracted patent document (step S406). For example, the search unit 522 sets the initial value of the evaluation value of each patent document to 0, and each time each patent document is extracted, the search unit 522 adds the number of dimensions of the first feature vector when each patent document is extracted to the evaluation value. .. The larger the number of dimensions, the larger the amount of information represented by the first feature vector, and it is highly possible that the extracted patent document corresponds to the keyword or image specified in the search query. Therefore, the search unit 522 can accurately extract the patent document corresponding to the keyword or image specified in the search query by increasing the evaluation value as the number of dimensions increases.
  • the search unit 522 determines whether or not the number of patent documents whose evaluation value is equal to or greater than the reference value is equal to or greater than a predetermined number, and whether or not the number of dimensions of the first feature vector to be compared is the minimum number of dimensions. , Is determined (step S407).
  • the reference value and the predetermined number are set in advance. When the number of patent documents whose evaluation value is equal to or more than the reference value is less than a predetermined number and the number of dimensions of the first feature vector to be compared is not the minimum number of dimensions, the search unit 522 returns the process to step S404 and steps. The processing of S404 to S407 is repeated.
  • the search unit 522 compares the first feature vector generated for the designated data with the first feature vector generated for each patent document in descending order of the number of dimensions of the first feature vector. Extract the patent documents corresponding to the designated data.
  • the search unit 522 can accurately extract patent documents that approximate the designated data by comparing the first feature vectors in descending order of the amount of information.
  • the search unit 522 can end the search process when a sufficient number of patent documents have been extracted, and the processing time of the search process can be shortened. Further, the search unit 522 can complete the search process by repeating the processes of steps S404 to S407 by the maximum number of dimensions of the first feature vector at the maximum, and the processing time of the search process increases. Can be suppressed.
  • the search unit 522 extracts the data. First display data for displaying each patent document side by side is generated (step S408), and a series of steps is completed. As a result, the search unit 522 generates the first display data for displaying a plurality of patent documents corresponding to the designated data designated by the user side by side. The search unit 522 generates the first display data so that the extracted patent documents are displayed in order of known dates or randomly arranged.
  • the search unit 522 calculates the similarity with the designated data (the cosine similarity between the feature vector of each patent document and the feature vector of the designated data, the Euclidean distance, etc.) for each extracted patent document, and each extracted patent.
  • the documents may be displayed side by side in descending order of similarity.
  • the search unit 522 searches for a plurality of patent documents using the third integrated information according to the request from the user.
  • the search unit 522 compares the first feature vector generated for the designated data with the first feature vector generated for each patent document in ascending order of the number of dimensions of the first feature vector.
  • the patent document corresponding to may be extracted.
  • the search unit 522 selects the first feature vector having the smallest number of dimensions among the plurality of first feature vectors as the first feature vector to be compared. To do.
  • the search unit 522 selects the first feature vector having the next smallest number of dimensions as the first feature vector selected last time as the first feature vector to be compared.
  • the process of step S406 is omitted, and in step S407, the search unit 522 determines whether or not the number of extracted patent documents is within a predetermined range, and the number of dimensions of the first feature vector to be compared is the maximum number of dimensions. It is determined whether or not it is.
  • the search unit 522 When the number of extracted patent documents is not within a predetermined range and the number of dimensions of the first feature vector to be compared is not the maximum number of dimensions, the search unit 522 returns the process to step S404 and performs the processes of steps S404 to S407. repeat. On the other hand, when the number of extracted patent documents is within a predetermined range, or when the number of dimensions of the first feature vector to be compared is the maximum number of dimensions, in step S408, the search unit 522 uses the extracted patents. The first display data for displaying the documents side by side is generated, and a series of steps are completed.
  • the search unit 522 extracts an appropriate number of patent documents earlier by comparing the first feature vectors in ascending order of the amount of information. It is possible to shorten the processing time of the search process.
  • the first feature vector generated by the search unit 522 is not limited to the feature vector whose element is the hash value obtained by converting the second feature vector of each patent document using LSH.
  • the plurality of first feature vectors may be feature vectors having different numbers of dimensions, and may be feature vectors having TF-IDF, Bag of Words, BM25, or the like as each element.
  • FIG. 26 is a flowchart showing an example of the operation of the update process in the search server 500.
  • the update process shown in FIG. 26 is executed in step S312 of the output process shown in FIG. 23.
  • the search unit 522 identifies the patent document designated by the user in the terminal device 10 or the patent document continuously displayed for a predetermined time or longer from the information indicating the patent document included in the received update instruction signal (). Step S501).
  • the search unit 522 calculates the degree of similarity between each patent document displayed by the first display data generated in step S408 of the search process shown in FIG. 25 and the patent document specified in step S501 (step). S502).
  • the search unit 522 calculates the cosine similarity or the Euclidean distance of the first feature vector of each patent document as the similarity.
  • the search unit 522 generates second display data in which the patent documents displayed by the first display data are rearranged based on the calculated similarity (step S503), and ends a series of steps.
  • the search unit 522 generates the second display data so that the patent documents displayed by the first display data are displayed side by side in descending order of similarity.
  • the search unit 522 is designated by the update instruction signal while arranging the first patent document to the patent document designated by the update instruction signal in the original order among the patent documents displayed by the first display data. Only the last patent document to the last patent document may be sorted based on the similarity. As a result, the patent documents already confirmed by the user are kept in the current order, the user does not need to confirm the already confirmed patent documents in duplicate, and the management system 1 improves the convenience of the user. Can be improved.
  • the management server 300 generates the first integrated information that integrates the format information acquired from the inquiry server 100 and the bibliographic information acquired from the bibliographic server 200, and then receives the information from the AI server 400.
  • the search information is further integrated to generate the first integrated information.
  • the management server 300 can generate the second integrated information in a short time, and can efficiently manage the information related to the patent documents.
  • the management system 1 can suppress the processing load and network load on each server, can store big data in the search server by a small group of servers, and can reduce the system construction cost. It has become possible.
  • the management system 1 uses an agile development method that can review development in a short period of time and give flexibility to the development content, and PDCA (Plan-Do) in a short period of time based on feedback from users. -Check-Act) It is now possible to cycle the cycle.
  • each patent document contains various information such as texts, classifications of technical fields, and images. If the text, the classification of the technical field, and the image are searched across different search servers, a large load is applied to each search server. For example, a search is performed on 50 million patent documents, one search key hits 30 million, another search server hits 20 million, and another search server hits 1000. Assume that 10,000 hits. In that case, it is necessary to collate the patent documents hit by each server only a total of (30 million ⁇ 20 million ⁇ 10 million) times, and the load on the search server becomes enormous. In addition, the load of processing for collating the patent documents hit by each server increases exponentially according to the number of search servers.
  • the management system 1 the information about one patent document is not divided and stored in a plurality of search servers, but all the information about one patent document is stored in one search server 500. Therefore, even when a user specifies a plurality of pieces of information to perform a search, each search server 500 searches for the plurality of designated pieces of information at once, so that the search can be performed efficiently, and each search server 500 can perform a search. The total processing load can be reduced. As a result, the management system 1 can perform a smooth search without constructing a large-scale server group, and in addition to various settings related to server installation, labor costs required for server operation and maintenance, and installation space. Such costs can be suppressed. Further, the management system 1 can perform a smooth search without setting an upper limit or the like in the search conditions, and can suppress the occurrence of search omissions and the like.
  • each server of the management system is not limited to the example of the management system 1 shown in FIG. 1, and it is possible to appropriately change which server each part of each server is arranged on.
  • all the servers included in the management system 1 may be configured by one server.
  • the documents managed by the management system 1 are not limited to patent documents, and may be any documents as long as they are related to classification, and may be non-patent documents such as papers and news articles.
  • each learning model used by the AI server 400 may not be generated by the AI server 400 and stored in the fourth storage device 410, but may be generated in an external server and stored in the external server.
  • the search information generation unit 421 may transmit the information input to each learning model to an external server and receive the output value from each learning model from the external server.
  • the search server 500 extracts patent documents registered in the most recent predetermined period (for example, several years) from the patent documents stored in the fifth storage device 510, and ranks the patent documents in order of frequency of appearance in the entire text of each patent document. Extract a number of terms as feature words. Then, the search server 500 includes a third feature vector having the number of appearances of each feature word in the whole sentence as an element for each patent document stored in the fifth storage device 510, and each feature word within the scope of the claims. A fourth feature vector having the number of occurrences as an element is generated.
  • the search unit 522 When executing a concept search, the search unit 522 has a third feature vector whose element is the number of occurrences of each feature word in the entire text of the patent document to be examined, and each within the claims of the patent document to be examined. A fourth feature vector having the number of occurrences of the feature word as an element is generated. Next, the search unit 522 calculates the first similarity between the third feature vector of the patent document to be examined and the third feature vector of each patent document stored in the fifth storage device 510. The first similarity is, for example, a normalized cross-correlation value. Next, the search unit 522 sets the top first predetermined number of patent documents stored in the fifth storage device 510 in descending order of the first similarity as patent documents similar to the patent document to be examined. Extract.
  • the search unit 522 calculates the second similarity between the fourth feature vector of the patent document to be examined and the fourth feature vector of each extracted patent document, and obtains information indicating each extracted patent document. 2 Arrange in descending order of similarity and send as search results.
  • the second similarity is, for example, a normalized cross-correlation value.
  • the search unit 522 may transmit only the information indicating the upper second predetermined number of patent documents in descending order of the second similarity among the extracted patent documents as the search result.
  • the terminal device 10 displays the information indicating each patent document shown in the search result in the order shown in the search result. As a result, the user can efficiently refer to each patent document in descending order of similarity.
  • the search unit 522 divides each sentence described in the claims of the patent document to be examined into a plurality of components by separating them with a comma or the like, and presents each component to the user. From the elements, one or more components used for the refined search may be selected. When the component is less than a predetermined number of characters, the search unit 522 may concatenate the component with the component following the component. In addition, the search unit 522 presents the selected component and each word included in the component to the user, and further selects one or more words used for the narrowing search from the presented words. You may choose.
  • the search unit 522 creates a search formula for each selected component on the condition that the word included in the component and its thesaurus are included, and extracts it as a patent document similar to the patent document to be examined.
  • the degree to which each patent document satisfied satisfies the created search formula is calculated.
  • the search unit 522 arranges the information indicating each extracted patent document in descending order of the calculated degree, and transmits it as a search result.
  • the search unit 522 may transmit only the information indicating the top third predetermined number of patent documents in descending order of the calculated degree among the extracted patent documents as the search result.
  • the search unit 522 may associate the information indicating each patent document with the information indicating whether or not the search formula corresponding to each component is satisfied in the search result.
  • the terminal device 10 displays information indicating each patent document shown in the search result together with information indicating whether or not the search formula corresponding to each component is satisfied. As a result, the user can efficiently refer to the patent documents satisfying the desired conditions.
  • search unit 522 ranks each drawing included in each patent document extracted by the concept search based on the degree of matching with the image specified in the image designation box 2417 of the search screen 2400 of FIG. 24. You may.
  • the search unit 522 is included in the patent document extracted by the concept search using a learning model pre-trained so as to output the degree of matching between the two images when two images are input. Acquires the degree of matching between the drawing and the specified image.
  • This learning model is trained using a plurality of learning images, for example, using deep learning.
  • the learning model which is a neural network, has a multi-layer structure composed of an input layer, an intermediate layer, and an output layer. Each node of the input layer is input with each information regarding the number or distribution of feature points such as corners or intersections of objects extracted from the two images as feature quantities. A weighted sum based on a predetermined weight of the values output from each node of the input layer is input to each node of the intermediate layer.
  • a weighted sum based on a predetermined weight of the values output from each node of the intermediate layer is input to each node of the output layer.
  • the output layer for example, outputs the input value as it is.
  • Each weight is set so that the more similar the two images are, the greater the value output by the output layer.
  • a known method such as an error back propagation method is used.
  • the search unit 522 calculates a feature amount from the image included in each drawing included in each patent document extracted by the concept search and the designated image, inputs the calculated feature amount into the learning model, and matches the same. Get the degree.
  • the search unit 522 associates the information indicating each drawing included in each patent document with the degree of matching calculated for each drawing.
  • the terminal device 10 displays each drawing included in each patent document shown in the search result in descending order of the degree of matching.
  • the user can efficiently refer to the drawings of the patent document including the desired image.
  • the terminal device 10 displays the description corresponding to the designated drawing or the description of the drawing as text. May be good.
  • the terminal device 10 may sort each drawing in an order similar to the drawing specified by the user.
  • FIG. 27 is a schematic diagram for explaining an example of processing by the management system 2 according to another embodiment.
  • the information processing apparatus 15 collects newly applied or registered patent documents from a predetermined patent office database at predetermined intervals and distributes them to the inquiry server 100, the journal server 200, and the AI server 400. ..
  • the inquiry server 100 acquires format information from each new patent document collected in a predetermined period.
  • the bibliographic server 200 extracts bibliographic information from each patent document.
  • the management server 300 generates first integrated information that integrates formal information and bibliographic information for each patent document.
  • the AI server 400 is not described in each patent document and is used for searching based on the bibliographic information or the content information of each patent document while the management server 300 is generating the first integrated information. Generate search information.
  • the management server 300 generates the second integrated information in which the first integrated information and the search information are integrated for each patent document.
  • This second integrated information is used to register in the search database.
  • the management system 2 operates in the same manner as the management system 1.
  • the management system 2 can also efficiently manage information related to patent documents.
  • Management system 100
  • Inquiry server 200
  • Ciographic server 300
  • Management server 400
  • AI server 500
  • Search server 600

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Technology Law (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un système et un procédé de gestion permettant de gérer efficacement des informations relatives à un document de brevet. Le système de gestion selon l'invention comprend : un premier serveur destiné à obtenir, d'un groupe de données contenant des informations de formulaire et des informations de contenu pour une pluralité de documents de brevet, les informations de formulaire pour chaque document de brevet; un deuxième serveur destiné à obtenir, d'une base de données externe, un numéro de document et des informations bibliographiques ayant été converties en un format de données commun, pour chaque document de brevet; un troisième serveur destiné à générer des premières informations intégrées, les informations de formulaire et les informations bibliographiques ayant été intégrées, pour chaque document de brevet pour lequel le premier serveur a obtenu des informations de formulaire; et un quatrième serveur destiné à générer des informations de classification pour chaque document de brevet. Après achèvement des premières informations intégrées, le troisième serveur génère des deuxièmes informations intégrées, les premières informations intégrées et les informations de classification ayant été intégrées pour chaque document de brevet.
PCT/JP2020/011838 2019-07-30 2020-03-17 Système et procédé de gestion WO2021019831A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2019-139759 2019-07-30
JP2019139759 2019-07-30
JP2020-008423 2020-01-22
JP2020008423A JP6691280B1 (ja) 2019-07-30 2020-01-22 管理システム及び管理方法

Publications (1)

Publication Number Publication Date
WO2021019831A1 true WO2021019831A1 (fr) 2021-02-04

Family

ID=70413819

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/011838 WO2021019831A1 (fr) 2019-07-30 2020-03-17 Système et procédé de gestion

Country Status (2)

Country Link
JP (1) JP6691280B1 (fr)
WO (1) WO2021019831A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511058A (zh) * 2022-01-27 2022-05-17 国网江苏省电力有限公司泰州供电分公司 一种用于电力用户画像的负荷元件构建方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022076739A (ja) * 2020-11-10 2022-05-20 株式会社リコー 配信システム、配信方法、及びプログラム
JP7029205B1 (ja) 2021-06-08 2022-03-03 株式会社AI Samurai 技術調査支援装置、技術調査支援方法、および技術調査支援プログラム
JP7029204B1 (ja) 2021-06-08 2022-03-03 株式会社AI Samurai 技術調査支援装置、技術調査支援方法、および技術調査支援プログラム
KR102524124B1 (ko) * 2022-11-18 2023-04-20 주식회사 무하유 문서 내 이미지 객체의 변형 및 표절 검증을 위한 메타데이터 생성 장치 및 그 방법
JP7391343B1 (ja) * 2023-03-15 2023-12-05 株式会社Fronteo 情報処理装置及び情報処理方法
JP7376033B1 (ja) * 2023-03-15 2023-11-08 株式会社Fronteo 情報処理装置及び情報処理方法
JP7505834B1 (ja) 2023-03-28 2024-06-25 寛 大谷 技術文献の内容を把握可能な要約の生成の応用

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003141168A (ja) * 2001-11-05 2003-05-16 Ricoh Co Ltd 特許情報検索システム、特許情報検索方法、これらの機能を実現するためのプログラム、及び記録媒体
JP2007199987A (ja) * 2006-01-26 2007-08-09 Hitachi Ltd 特許情報検索システム
JP2008516341A (ja) * 2004-10-08 2008-05-15 パテラ,インコーポレーテッド 分類された文献の分類を拡張した索引付けおよび検索
JP2009211144A (ja) * 2008-02-29 2009-09-17 Panasonic Corp データ処理システム、データ処理方法およびデータ処理プログラム
JP2014119839A (ja) * 2012-12-14 2014-06-30 Hitachi Systems Ltd 検索システム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003141168A (ja) * 2001-11-05 2003-05-16 Ricoh Co Ltd 特許情報検索システム、特許情報検索方法、これらの機能を実現するためのプログラム、及び記録媒体
JP2008516341A (ja) * 2004-10-08 2008-05-15 パテラ,インコーポレーテッド 分類された文献の分類を拡張した索引付けおよび検索
JP2007199987A (ja) * 2006-01-26 2007-08-09 Hitachi Ltd 特許情報検索システム
JP2009211144A (ja) * 2008-02-29 2009-09-17 Panasonic Corp データ処理システム、データ処理方法およびデータ処理プログラム
JP2014119839A (ja) * 2012-12-14 2014-06-30 Hitachi Systems Ltd 検索システム

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114511058A (zh) * 2022-01-27 2022-05-17 国网江苏省电力有限公司泰州供电分公司 一种用于电力用户画像的负荷元件构建方法及装置

Also Published As

Publication number Publication date
JP6691280B1 (ja) 2020-04-28
JP2021022359A (ja) 2021-02-18

Similar Documents

Publication Publication Date Title
WO2021019831A1 (fr) Système et procédé de gestion
US8046368B2 (en) Document retrieval system and document retrieval method
KR100816934B1 (ko) 문서검색 결과를 이용한 군집화 시스템 및 그 방법
Hienert et al. Digital library research in action–supporting information retrieval in sowiport
JP6033697B2 (ja) 画像評価装置
WO2015084724A1 (fr) Procédé pour désambiguïser des caractéristiques dans un texte non structuré
JP6529133B2 (ja) 複数地域でのトピックの評価を分析する装置、プログラム及び方法
CN112231555A (zh) 基于用户画像标签的召回方法、装置、设备及存储介质
CN107102976A (zh) 基于微博的娱乐新闻自动构建技术与系统
CN109948154B (zh) 一种基于邮箱名的人物获取及关系推荐系统和方法
US20120239657A1 (en) Category classification processing device and method
CN113157867A (zh) 一种问答方法、装置、电子设备及存储介质
CN111221968A (zh) 基于学科树聚类的作者消歧方法及装置
JP2006318398A (ja) ベクトル生成方法及び装置及び情報分類方法及び装置及びプログラム及びプログラムを格納したコンピュータ読み取り可能な記憶媒体
Song et al. Semi-automatic construction of a named entity dictionary for entity-based sentiment analysis in social media
JP2021144348A (ja) 情報処理装置及び情報処理方法
JP5302614B2 (ja) 施設関連情報の検索データベース形成方法および施設関連情報検索システム
Ritze Web-scale web table to knowledge base matching
CN113515699A (zh) 信息推荐方法及装置、计算机可读存储介质、处理器
CN109508557A (zh) 一种关联用户隐私的文件路径关键词识别方法
EP3103029A1 (fr) Système et procédé d'extension d'interrogation au moyen d'une langue et de variantes de la langue
CN113032549B (zh) 一种文档排序方法、装置、电子设备及存储介质
JP4428703B2 (ja) 情報検索方法及びそのシステム並びにコンピュータプログラム
Eberius et al. Publish-time data integration for open data platforms
CN103995849B (zh) 一种事件跟踪方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20847588

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20847588

Country of ref document: EP

Kind code of ref document: A1