WO2007060726A1

WO2007060726A1 - Document retrieval device, method, and program

Info

Publication number: WO2007060726A1
Application number: PCT/JP2005/021652
Authority: WO
Inventors: Shigeki Tanishima; Hiroko Matsui
Original assignee: Mitsubishi Space Software Co., Ltd.
Priority date: 2005-11-25
Filing date: 2005-11-25
Publication date: 2007-05-31
Also published as: JPWO2007060726A1; JP4398461B2

Abstract

In order to perform narrowed retrieval of a plenty of retrieval results, it is possible to output a document term in the document information, a class to which a document term belongs, and a score indicating importance of a document term as information. A document retriever (110) accesses a document database (200) with an inputted query to acquire document information. A retrieval result generator (120) extracts a document term from the document information and generates a class and a score of the document term. A retrieval result display (105) displays the extracted document term together with its class and score. The class is based on a hierarchical structure indicating the document term. The score is high when the appearance frequency is low in the whole document information and the appearance frequency is high in the hit document information.

Description

Specification

Literature retrieval apparatus, method and program

Technical field

The present invention relates to a document search device, a document search method, and a document search program that support document search.

Background art

[0002] In the conventional document search system, the hit documents are only displayed, and the keyword used for the re-search is invented and added by the user each time. In other words, at the time of re-searching to narrow down the hit documents, the user had to come up with keywords suitable for narrowing down. For this reason, it has been very difficult to narrow down the search by the conventional document search system. In addition, if a simple search is performed using a simple keyword, a huge number of documents (for example, 10,000 or more) will be hit, and the search result that indicates the large number of documents will also be important. It is difficult to select a document and obtain knowledge in the document.

[0003] For example, a conventional medical literature search system is an example of a system in which a narrow search is difficult, and the system is as follows.

The medical literature database includes MEDLINE (MEDlars on LINE) provided by the National Library of Medicine (NLM), which contains more than 10 million medical-related literature information. In order to access MEDLINE, PubMed! And a document search system are provided, and users access a web page with an interface to access PubMed and search medical document information recorded in MEDLINE.

MEDLINE's bibliographic information includes medical terms (MeSH term: Medical Subject Headings), terminology or name of protein's compound (Substance), RegistryNumber, etc. It becomes a key when searching. In addition, MeSH terms and substance information are recorded in the MeSH database accessible by PubMed, just like MEDLINE. The MeSH database has three databases called Descriptor Quarifier SCR (Supplementary Concept Records). To do.

For each MeSH term (Descriptor term), the Descriptor database stores information such as Heading, a synonym for Heading that is a name that uniquely identifies the Me SH term, a TreeNumber that identifies the MeSH term in a hierarchical structure, and a description of the MeSH term. The SCR stores information on the terms of Substane (SCR terms) in association with MeSH terms stored in the Descriptor database.

In addition to those used in the medical literature search system, the OMIM (Online Mendelian Inheritance in Man) database that allows users to access information on genes and diseases from a web page, and RegistryNumber (CAS registration number) that identifies chemical substances ) Including chemical information database (CAS) that stores chemical information.

Patent Document 1: Japanese Patent Laid-Open No. 2003-044481

Disclosure of the invention

Problems to be solved by the invention

[0004] Based on the above-described problems, the present invention, for example, in document search for a large amount of document information, narrows down a large amount of document information and supports document search to acquire useful document information. At the same time, it aims to promote new medical 'molecular biology discovery related to the relationship between molecules and diseases.

Means for solving the problem

[0005] The document retrieval apparatus of the present invention includes a document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues. A search sentence input unit for connecting and inputting a search sentence as input device power, a literature information acquisition part for obtaining literature information corresponding to the search sentence input by the search sentence input unit from the literature database, and the literature information acquisition unit For each of the thesaurus rasters extracted by the thesaurus raster extraction unit for extracting the thesaurus rasters included in the acquired document information obtained from the classification database. The acquisition unit, the thesaurus raster extracted by the thesaurus star extraction unit, and the classification acquisition unit The classification And an output unit that outputs to the output device in association with each other.

[0006] The document retrieval apparatus of the present invention is a document database that stores a collection of document information including a thesaurus raster, a classification of genes, diseases, compounds, and tissues, and is associated with a classification based on the hierarchical structure of the thesaurus raster. A search sentence input unit for inputting a search sentence from an input device, and reference information corresponding to the search sentence input by the search sentence input unit. The document information acquisition unit acquired from the document, the thesaurus raster extraction unit for extracting the thesaurus term included in the acquired document information acquired by the document information acquisition unit, and the thesaurus raster extracted by the thesaurus raster extraction unit, respectively A classification acquisition unit for acquiring the classification of each thesaurus raster from the classification database; and an extraction of the thesaurus raster extraction unit. And an output unit that associates the generated thesaurus star with the classification acquired by the classification acquisition unit and outputs the same to an output device.

[0007] The document search device of the present invention is connected to a document database that stores a set of document information including a thesaurus raster, and includes a search sentence input unit that inputs a search sentence as input device power, and the search sentence input unit. A document information acquisition unit that acquires document information corresponding to the input search sentence from the document database, a thesaurus raster extraction unit that extracts a thesaurus raster included in the acquired document information acquired by the document information acquisition unit, and the thesaurus Each of the thesaurus rasters extracted by the term extraction unit includes the corresponding thesaurus raster in the number of acquired document information with respect to the ratio of the number of document information including the corresponding thesaurus raster in the number of document information in the set of document information. A term score calculation unit that calculates a score of the ratio of the number of acquired document information, and an extraction by the thesaurus raster extraction unit Characterized by comprising an output section for outputting to the output device in association with calculated the score Shiso Rasutamu said term score calculation unit has.

[0008] Further, a storage device is provided for storing a first threshold for the number of pieces of acquired document information and a second threshold for the number of pieces of acquired document information including the thesaurus raster, and the term score The calculation unit acquires the first threshold value and the second threshold value of the storage device power, and acquires the first threshold value, the second threshold value, the number of acquired document information, and the acquired document information including the thesaurus raster. Based on the number and the number of acquired document information is less than the first threshold and the thesaurus It is characterized in that the score of the thesolar star whose number of acquired document information including the star is less than the second threshold is calculated low.

[0009] The document search device of the present invention is connected to a document database that stores a set of document information including a thesaurus raster, and includes a search sentence input unit that inputs a search sentence as input device power, and the search sentence input unit. A document information acquisition unit that acquires document information corresponding to the input search sentence from the document database, and a thesaurus raster extraction that extracts a thesaurus raster contained in the acquired document information acquired by the document information acquisition unit and stores it in a storage device And the number of duplicate terms that calculates the number of thesaurus the same as the thesaurus raster extracted in the past by the thesaurus raster extraction unit in the thesaurus raster extracted this time by the thesaurus raster extraction unit And an output unit that outputs the number of thesaurus rasters calculated by the duplicate term number calculation unit to an output device. And

[0010] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information including a thesaurus raster and a hierarchy database that stores the hierarchy information of the thesaurus raster, and inputs a retrieval sentence from an input device. A search sentence input unit, a document information acquisition unit that acquires document information corresponding to the search sentence input by the search statement input unit from the document database, and a thesaurus included in the acquired document information acquired by the document information acquisition unit A thesaurus raster extracting unit for extracting terms, a thesaurus hierarchy acquiring unit for acquiring hierarchical information about the thesaurus raster extracted by the thesaurus raster extracting unit, and a specified hierarchy input for inputting the specification of the hierarchy of the thesaurus raster And the specified hierarchy input of the thesaurus raster extracted by the slicer raster extraction unit. A thesaurus raster selection unit for selecting a thesaurus raster corresponding to the hierarchy input by the unit based on the hierarchy information acquired by the thesaurus hierarchy acquisition unit, and the thesaurus raster selected by the thesaurus raster selection unit as an output device And an output unit that outputs to the output.

[0011] The document search apparatus of the present invention is connected to a document database that stores a set of document information including a thesaurus raster, and for a search sentence including a thesaurus raster, a narrow-sense search using the thesaurus raster included in the search sentence as a search condition A search method input unit for inputting designation of a search method! Or a broad search method that uses a thesaurus raster including the search method and the thesaurus raster including the search statement and the thesaurus raster below the thesaurus raster included in the search statement as search conditions; ,thesaurus A search text input unit that inputs a search text including a term from an input device, and according to a search condition in the search method input by the search method input unit, the document information corresponding to the search text input by the search text input unit is A document information acquisition unit acquired from a document database, a thesaurus raster extraction unit that extracts a thesaurus raster included in the acquired document information acquired by the document information acquisition unit, and the thesaurus raster extracted by the thesaurus raster extraction unit And an output unit that outputs to the output.

[0012] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information including a thesaurus raster and a synonym database that stores synonyms of the thesaurus raster, and inputs a plurality of search words. And a synonym term for acquiring each of the thesaurus rasters having synonyms for the plurality of search terms input by the search term input unit and displaying them on the display device collectively. A search term designating unit for designating a thesaurus raster as a search term from the thesaurus rasters acquired and displayed by the synonym term obtaining unit, and a search sentence including the thesaurus raster designated by the search term designating unit. Input device power A search sentence input part to be input and reference information corresponding to the search sentence input by the search sentence input part are stored in the reference database. The document information acquisition unit to be acquired, the thesaurus raster extraction unit for extracting the thesaurus raster contained in the acquired document information acquired by the document information acquisition unit, and the thesaurus raster extracted by the thesaurus raster extraction unit to the output device And an output section.

The document search method of the document search apparatus of the present invention stores a document database that stores a collection of document information including a thesaurus raster, and stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues. A document search method of a document search apparatus that connects to a classification database and searches for a document, wherein a search sentence input unit inputs a search sentence from an input device, and a reference information acquisition unit performs the search sentence input process. A document information acquisition process is performed to acquire the document information corresponding to the search text input by the input unit, and the thesaurus raster extraction unit includes the thesaurus raster included in the acquired document information acquired by the document information acquisition unit. A thesaurus raster extraction process is performed, and the classification acquisition unit applies to each of the thesaurus rasters extracted by the thesaurus raster extraction unit. , Each sheet A classification acquisition process for acquiring the classification of the sorrustum from the classification database is performed, and the output unit associates the thesaurus raster extracted by the thesaurus raster extraction unit with the classification acquired by the classification acquisition unit, and outputs it to the output device It is characterized by output processing.

[0014] The document search method of the document search apparatus of the present invention includes a document database that stores a collection of document information including a thesaurus raster, a classification of genes, diseases, compounds, and tissues. A document retrieval method for a document retrieval apparatus that retrieves documents by connecting to a classification database that stores each thesaurus raster in association with classification based on a hierarchical structure.

The search sentence input unit performs a search sentence input process for inputting the search sentence from the input device, and the document information acquisition unit acquires the document information corresponding to the search sentence input by the search sentence input unit. The document information acquisition process is performed, and the thesaurus raster extraction unit performs a thesaurus raster extraction process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquisition unit, and the classification acquisition unit performs the thesaurus raster extraction. For each of the extracted thesaurus rasters, a classification acquisition process for acquiring the classification of each thesaurus raster from the classification database is performed, and the output section extracts the thesaurus raster extracted by the thesaurus raster extraction section and the classification acquisition section. The output processing is performed by associating the acquired classification and outputting to the output device.

[0015] The document search method of the document search device of the present invention is a document search method of a document search device that searches a document by connecting to a document database that stores a collection of document information including a thesaurus raster. The input unit performs a search statement input process for inputting a search statement from an input device, and the document information acquisition unit acquires the document information corresponding to the search statement input by the search statement input unit from the document database. The thesaurus term extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit, and the term score calculating unit For each of the thesaurus rasters extracted by the extraction unit, the ratio of the number of document information that includes the thesaurus raster to the number of document information in the collection of document information. In against performs term score calculation process for calculating the score of the ratio of the number of document information includes the thesaurus term in the number of acquired document information, the output unit is the thesaurus data An output process is performed in which the thesaurus raster extracted by the term extraction unit and the score calculated by the term score calculation unit are associated with each other and output to an output device.

[0016] The document retrieval method of the document retrieval device further includes a storage device that stores a first threshold value regarding the number of acquired document information and a second threshold value regarding the number of document information including the thesaurus raster. In the term score calculation process, the term score calculation unit acquires the first threshold value and the second threshold value of the storage device power, and the first threshold value, the second threshold value, and the number of acquired document information And the number of acquired document information that includes the thesaurus raster, and the number of acquired document information is less than the first threshold and the number of acquired document information that includes the thesaurus raster is less than the second threshold. A literature search method characterized by calculating a low score of the thesaurus raster.

[0017] The document search method of the document search device of the present invention is a document search method of a document search device for searching a document by connecting to a document database that stores a collection of document information including a thesaurus raster. The input unit performs a search statement input process for inputting a search statement from an input device, and the document information acquisition unit acquires the document information corresponding to the search statement input by the search statement input unit from the document database. The thesaurus term extraction unit extracts the thesaurus raster contained in the acquired document information acquired by the document information acquisition unit and stores it in the storage device, and calculates the number of duplicate terms. Of the thesaurus raster extracted by the thesaurus raster extraction unit this time, the thesaurus raster extracted by the thesaurus raster extraction unit stored in the storage device in the past The number of thesaurus terms that are the same as the number of thesaurus rasters is calculated, and the output unit performs the output process that outputs the number of thesaurus rasters calculated by the number of thesaurus terms to the output device. And

[0018] The document search method of the document search apparatus of the present invention is connected to a document database that stores a collection of document information including a thesaurus raster and a hierarchical database that stores hierarchical information of the thesaurus raster, and searches the document. A document search method for a document search apparatus, wherein a search sentence input unit performs a search sentence input process for inputting a search sentence from an input device !, and corresponds to a search sentence input by the reference information acquisition unit. A document information acquisition process for acquiring document information from the document database is performed, and the thesaurus raster extraction unit A thesaurus raster extraction process is performed to extract a thesaurus raster contained in the acquired document information acquired by the information acquisition unit, and the thesaurus hierarchy acquisition unit acquires the hierarchy information about the thesaurus raster extracted by the thesaurus raster extraction unit from the hierarchy database. The thesaurus hierarchy acquisition processing is performed, the designated hierarchy input section performs the designation hierarchy input process for inputting the designation of the thesaurus raster hierarchy, and the thesaurus raster selection section performs the designation of the thesaurus raster extracted by the thesaurus term extraction section. A thesaurus raster selection process for selecting a thesaurus raster corresponding to the hierarchy input by the hierarchy input section based on the hierarchy information acquired by the thesaurus hierarchy acquisition section is performed, and the output section selects the thesaurus raster selection section. Output processing to output the thesaurus raster to the output device And wherein the Ukoto.

The document search method of the document search device of the present invention is a document search method of a document search device that connects to a document database that stores a collection of document information including a thesaurus raster and searches the document. For the search sentence including the thesaurus raster, the input unit searches for the narrow-sense search method using the thesaurus raster included in the search sentence as the search condition, the thesaurus raster included in the search sentence, and the thesaurus raster below the thesaurus raster included in the search sentence. Search condition input process to input the specification of! / With the broad search method as a condition, or search method input processing to enter the search term input, and the search statement input part input the search statement including the thesaurus raster The document information acquisition unit performs processing, and the document information corresponding to the search sentence input by the search sentence input unit according to the search condition in the search method input by the search method input unit. The thesaurus database extraction processing is performed to acquire the literature information, the thesaurus raster extraction unit performs the thesaurus raster extraction processing to extract the thesaurus raster included in the acquired document information acquired by the literature information acquisition unit, and the output unit is the thesaurus It is characterized by output processing that outputs the thesaurus raster extracted by the term extraction unit to an output device.

[0020] The document search method of the document search device of the present invention is connected to a document database that stores a collection of document information including a thesaurus raster and a synonym database that stores synonyms of the thesaurus raster, and searches the document. The search term input unit performs a search term input process for inputting a plurality of search terms from an input device, and the synonym term acquisition unit performs a plurality of search terms input by the search term input unit. Each search term is a synonym. Performs synonym term acquisition processing for acquiring each thesaurus raster from the synonym database, and performs specification processing for the designation unit to specify the thesaurus raster as the thesaurus raster force search word acquired by the synonym term acquisition unit! The search text input unit performs a search text input process for inputting a search text including the thesaurus raster specified by the specification unit.

The document information acquisition unit performs document information acquisition processing for acquiring the document information corresponding to the search sentence input by the search sentence input unit from the document database !, and the thesaurus raster extracting unit includes the document information acquisition unit. A thesaurus raster extraction process for extracting a thesaurus raster contained in the acquired document information acquired is performed, and an output unit performs an output process for outputting the thesaurus raster extracted by the thesaurus raster extraction unit to an output device.

[0021] A document search program of the present invention causes a computer to execute any one of the document search methods described above.

[0022] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information and a classification database that stores each document term in association with a specific classification, and inputs a retrieval sentence from an input device. Included in the search text input section, the literature information acquisition section for acquiring the literature information corresponding to the search text input by the search text input section from the literature database, and the acquired literature information acquired by the literature information acquisition section A document term extraction unit that extracts a document term, a document acquisition unit that acquires a classification of each document term from the classification database, and a document term extraction unit that each document term extracted by the document term extraction unit is extracted. And an output unit that associates the literature term with the classification acquired by the classification acquisition unit and outputs the associated term to the output device.

[0023] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information, and corresponds to a retrieval sentence input unit that inputs a retrieval sentence from an input device, and a retrieval sentence that is input by the retrieval sentence input unit. A document information acquisition unit that acquires document information to be acquired from the document database, a document term extraction unit that extracts document terms included in the acquired document information acquired by the document information acquisition unit, and a document that is extracted by the document term extraction unit For each term, the number of document information containing the document term in the number of document information in the set of document information A term score calculation unit that calculates a score of the ratio of the number of acquired document information including the document term in the number of acquired document information with respect to the ratio, and the document term extracted by the document term extraction unit and the term score calculation unit And an output unit that associates the calculated score with the score and outputs the score to an output device.

[0024] Furthermore, a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of acquired document information including the document term, the term score calculator The storage device power is acquired based on the first threshold value and the second threshold value, and based on the first threshold value, the second threshold value, the number of acquired document information, and the number of acquired document information including the document term. The score of the document term in which the number of acquired document information is less than the first threshold and the number of acquired document information is less than the second threshold is calculated to be low.

[0025] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information, and corresponds to a retrieval sentence input unit that inputs a retrieval sentence from an input device, and a retrieval sentence that is input by the retrieval sentence input unit. A document information acquisition unit that acquires document information to be acquired from the document database; a document term extraction unit that extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores the document term in a storage device; and the document term Among the terminology extracted by the extraction unit this time, the duplicate term number calculation unit that calculates the number of the same document terms as the literature term extracted in the past stored in the storage device by the literature term extraction unit, and the duplication The term number calculation unit includes an output unit that outputs the calculated number of reference terms to an output device.

[0026] The document search method of the document search apparatus of the present invention connects a document database that stores a set of document information and a classification database that stores each document term in association with a specific classification, and searches the document. The document search method of the document search apparatus performs search sentence input processing in which the search sentence input unit inputs the search sentence from the input device, and the document information acquisition unit corresponds to the search sentence input by the search sentence input unit. A document information acquisition process for acquiring document information from the document database is performed, and a document term extraction unit performs a document term extraction process for extracting a document term included in the acquired document information acquired by the document information acquisition unit, and classified. For each document term extracted by the document term extraction unit, the acquisition unit performs classification acquisition processing for acquiring the classification of each document term from the classification database, and the output unit performs the sentence acquisition. An output process is performed in which the literature terms extracted by the dedicated term extraction unit are associated with the classification acquired by the classification acquisition unit and output to an output device.

[0027] The document search method of the document search device of the present invention is a document search method of a document search device that searches for documents by connecting to a document database that stores a set of document information, and the search sentence input unit searches the document search method. A document information acquisition process in which a document is acquired from an input device, and a document information acquisition unit acquires document information corresponding to the search statement input by the search statement input unit from the document database. The document term extraction unit performs a document term extraction process for extracting the document term included in the acquired document information acquired by the document information acquisition unit, and the term score calculation unit extracts the document term extraction unit. For each document term, the acquired document information including the document term in the number of acquired document information to the ratio of the number of document information including the document term in the number of document information in the set of document information A term score calculation process is performed to calculate a score of the ratio of the number of documents, and the output unit associates the document term extracted by the document term extraction unit with the score calculated by the term score calculation unit and outputs it to the output device The output processing is performed.

[0028] The document search method of the document search apparatus further includes a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of document information including the document term. In the term score calculation process, the term score calculation unit force, the storage device force, the first threshold value and the second threshold value are acquired, the first threshold value, the second threshold value, the number of acquired document information, and the document Based on the number of acquired document information that includes the term, the number of acquired document information that is less than the first threshold and the number of acquired document information that includes the document term is less than the second threshold The term score is calculated low.

[0029] The document search method of the document search device of the present invention is a document search method of a document search device that searches a document by connecting to a document database that stores a set of document information, and the search sentence input unit searches the document search method. A document information acquisition process in which a document is acquired from an input device, and a document information acquisition unit acquires document information corresponding to the search statement input by the search statement input unit from the document database. The document term extraction unit extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores it in a storage device, and the duplicate term number calculation unit extracts the document term Extracted this time Among the document terms, the document term extraction unit calculates the number of document terms that are the same as the document terms extracted in the past and stored in the storage device. An output process is performed in which the number of document terms calculated by the number calculation unit is output to an output device.

[0030] A document search program of the present invention causes a computer to execute any one of the document search methods described above.

[0031] The document retrieval apparatus of the present invention includes a document database that stores a set of document information including a thesaurus raster, and a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues. A search sentence input unit for connecting and inputting a search sentence as input device power, a literature information acquisition part for obtaining literature information corresponding to the search sentence input by the search sentence input unit from the literature database, and the literature information acquisition unit For each of the thesaurus rasters extracted by the thesaurus raster extraction unit for extracting the thesaurus rasters included in the acquired document information obtained from the classification database. For each of the acquisition unit and the thesaurus raster extracted by the thesaurus solar extraction unit, Term score that calculates the score of the ratio of the number of acquired document information that includes the thesaurus raster in the number of acquired document information to the ratio of the number of document information that includes the thesaurus raster in the number of document information in the set of Together with the calculation unit and the thesaurus raster extracted by the thesaurus raster extraction unit, the classification acquired by the classification acquisition unit and the score calculated by the term core calculation unit are used as information related to the relationship between molecules, diseases, and the like. And an output unit for outputting to an output device.

[0032] The document retrieval method of the present invention includes a document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues. A document search method of a document search apparatus for connecting and searching for documents, wherein a search sentence input unit performs a search sentence input process for inputting a search sentence from an input device, and a document information acquisition unit is input to the search sentence input unit A document information acquisition process is performed to acquire document information corresponding to the retrieved text from the document database, and the thesaurus raster extracting unit includes the thesaurus included in the acquired document information acquired by the document information acquiring unit. Performs a thesaurus raster extraction process for extracting terms, and the classification acquisition unit performs a classification acquisition process for acquiring the classification of each thesaurus raster as well as the classification database power for each of the thesaurus rasters extracted by the thesaurus term extraction unit! For each of the thesaurus rasters extracted by the thesaurus raster extraction unit, in terms of the number of acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information A term score calculation process is performed to calculate a score of the ratio of the number of acquired document information including the thesaurus raster, and the output unit force together with the thesaurus raster extracted by the thesaurus raster extraction unit and the classification acquired by the classification acquisition unit The score calculated by the term score calculator Preparative, molecules, and performing an output process of outputting to the output device as the information related to the related, such as a disease.

[0033] A document search program according to the present invention causes a computer to execute the document search method.

The invention's effect

[0034] According to the present invention, the classification (gene, disease, compound, tissue), and term score as well as the thesaurus raster (for example, MeSH term) included in the hit literature information are used as information for narrowing down medical literature information. By being able to output, it is possible to support a narrowed-down document search for obtaining useful document information.

[0035] Further, according to the present invention, the classification and term score can be output as information related to the relationship between molecules, diseases, etc., together with the thesaurus raster included in the hit literature information, so that medical 'molecular biological discovery Can be encouraged.

[0036] Further, the number of thesaurus rasters extracted in the past in the thesaurus raster extracted this time can be output, so that the transition of the search is effective. Therefore, the narrowed-down documents for obtaining useful literature information Search can be supported.

[0037] Further, a thesaurus raster of a specified hierarchy among the thesaurus rasters included in the hit literature information can be output as information for narrowing down medical literature information, so that a narrowed-down literature search for obtaining useful literature information can be performed. Can help.

[0038] Further, the literature information is searched by a specified search method between narrow search and broad search, and hits are found. By outputting the thesaurus raster contained in the document information as information for narrowing down medical document information, it is possible to support a narrowed-down document search for obtaining useful document information.

[0039] Further, a thesaurus raster having the input search word as a synonym is acquired, and further, the corresponding literature information can be searched at once in the specified thesaurus raster among the acquired thesaurus rasters, so that the narrowed-down document search can be performed. Can help.

[0040] Further, according to the present invention, it is possible to output the classification and term score as information for narrowing down the literature information together with the literature terms included in the hit literature information. It is possible to support a narrowed-down document search for obtaining information.

[0041] Further, according to the present invention, the classification and term score can be output as information related to the relationship between molecules, diseases, etc., together with the literature terms included in the hit literature information, so that the medical-molecular biological discovery is possible. Can be encouraged.

[0042] Further, since the number of the same document terms extracted in the past among the document terms extracted this time can be output, the transition of the search is effective. Therefore, useful document information is not limited to medical related documents. It is possible to support a narrowed-down document search for acquiring.

BEST MODE FOR CARRYING OUT THE INVENTION

[0043] Embodiment 1.

As information for narrowing down the literature information, the form of outputting the classification and term score together with the literature terms included in the hit literature information will be described below. In addition, in the following explanation, “Document information” is shown as “Medical literature information recorded on MEDLINE”, and “Thesaurus raster” is shown as “MeSH term”.

FIG. 1 is a configuration diagram of the document search device 100 according to the first embodiment.

The configuration of the document retrieval apparatus 100 according to the first embodiment will be described below with reference to FIG.

The document retrieval apparatus 100 includes the following.

The user environment setting unit 101 inputs various setting information (hereinafter referred to as environment setting information) of the document search apparatus 100 for each user as input device power, and the input environment setting information is input to the user. This information is stored in the user information DB (database) 191.

For each user who uses the document search device 100, the user information DB 191 is information related to a user when using the document search device 100, such as a login ID (identifier), password, and environment setting information for the document search device 100 (hereinafter, User information) is stored in a storage device, and access management for user information is performed.

The MEDLINE update information acquisition unit 102 periodically accesses NLM's FTP (File Transfer Protocol) site and provides information about the document information newly recorded (or scheduled to be recorded) in MEDLINE 200 (hereinafter referred to as update information). ) And based on the acquired update information, update the term DB (database) 192, which will be explained below, and notify users about the MED LINE200 bibliographic information update. .

The term DB192 is for MeSH terms: Heading (hereinafter referred to as “representative term”), Synonym of Heading (hereinafter referred to as “synonymous term”), MeSH TreeNumber (hereinafter referred to as “MeSH number”), MeSH term Explanation of terms, classifications belonging to medical information (genes, diseases, compounds, tissues, etc.) (hereinafter referred to as classifications), and the number of documents related to the MeSH term among all documents recorded in MEDLINE ( Information related to each MeSH term (hereinafter referred to as MeSH term information), such as the number of documents per term in all documents, is stored in the storage device, and access management for MeSH term information is performed.

The search result storage unit 193 includes MEDLINE200 search result information (hereinafter referred to as MEDLINE result information), document information corresponding to a query (search sentence) (hereinafter referred to as hit document information), and MEDLINE200. In addition to storing the total number of documents and the number of hit documents, MeSH terms (including SCR, which is also the search result information of the document search device 100 (hereinafter referred to as search result information), which also extracts the hit document information power, are described below. This is a storage device that stores information other than the representative term (hereinafter referred to as additional information) and information indicating the transition of search (hereinafter referred to as search transition information).

The display change unit 103 inputs information about the display contents of the search result (hereinafter referred to as display designation information) from the input device, and outputs display information based on the display designation information (hereinafter referred to as display change information). To change the display of search results. The login unit 104 also inputs the login ID and password specified by the user as input devices (hereinafter, the login ID and password specified by the user are referred to as login information), and stores the input login information and user information in the DB191. Based on the received user information, the user's access to the document search device 100 is controlled.

[0046] The document search unit 110 generates a query based on user designation, accesses the MEDLINE 200 using the generated query, and acquires MEDLINE result information from the MEDLINE 200.

[0047] The search result generation unit 120 generates search result information based on the MEDLINE result information.

The generated search result information is stored in the search result storage unit 193.

[0048] The search result display unit 105 displays the search result on the display device based on the MEDLINE result information and the search result information, and changes the display of the search result based on the display change information.

FIG. 2 is a configuration diagram of the document search unit 110 according to the first embodiment.

The configuration of the document search unit 110 in the first embodiment will be described below based on FIG.

[0050] The document search unit 110 includes the following.

The synonym search unit 111 also inputs the input device power to the term specified by the user first to search the literature information of MEDLINE200 (hereinafter referred to as the search term), and the input search term is set as the representative term or synonymous term. The representative term of the MeSH term is obtained from the term DB 192, and the term specified by the user (hereinafter referred to as the designated term) to be used for the query to access MEDLINE200 among the obtained representative terms is input from the input device. input.

The search term input unit 112 inputs a search term as input device power.

The representative term acquisition unit 113 acquires the representative term of the MeSH term whose search term is the representative term or synonymous term from the term DB 192.

The specified term acquisition unit 114 inputs the specified term as input device power.

The query generation unit 115 generates a query based on the specified term and environment setting information. The MEDLINE access unit 116 inputs a query as input device power and uses the entered query.

Access MEDLINE200, get MEDLINE result information from MEDLINE200, The acquired MEDLINE result information is stored in the search result storage unit 193.

FIG. 3 is a configuration diagram of the search result generation unit 120 in the first embodiment.

The configuration of search result generation unit 120 in the first embodiment will be described below based on FIG.

[0052] The search result generation unit 120 includes the following.

The MeSH term extraction unit 121 extracts MeSH terms from hit document information (MeSH term extraction also means acquisition of representative terms corresponding to the extracted MeSH terms).

The additional information generation unit 122 generates additional information based on the MEDLINE result information. The classification acquisition unit 123 acquires the extraction term classification from the term DB 192.

The term score calculation unit 124 calculates a score (term score) indicating the importance of the extracted term for the hit document.

The MeSH number acquisition unit 125 acquires the MeSH number of the extracted term from the term DB 192. The search transition information generation unit 126 is the current MEDLINE result information, extracted term information, attached calo information, past MEDLINE result information, extracted term information, and additional information. Based on the above, search transition information is generated.

FIG. 4 is a table showing MeSH term information stored and managed in the term DB 192 according to the first embodiment.

The term information table stored and managed by the term DB 192 in the first embodiment will be described below with reference to FIG.

Term DB192 is a table (hereinafter referred to as “Heading”, “MeSH number”, “Term description”, “Classification”, “Number of documents per term in all documents”, and “Synonym terms” for each MeSH term) , MeSH term table) is stored in advance as MeSH term information, and access management of MeSH term information is performed. In FIG. 4, “Entry ID” is information for identifying MeSH term information for each MeSH term. “Classification” is a classification of medical information of genes (genes), diseases (disseases), compounds (chemicals), and tissues (tissues), and the classifications to which the MeSH terms indicated by the “representative terms (heading)” belong. Show. gene Is indicated as “G”, disease as “D”, compound as “C”, and tissue as “T”.

[0054] In Fig. 4, the MeSH term, whose representative term is "Neoplasms", is identified by the MeSH number "C 04", is classified as "Disease (D)", and is related to all the documents included in MEDLINE. The number of documents (number of documents per term in all documents) is “135284”, indicating that it has synonymous terms such as rcancersj and “Benign”. It also indicates that MeSH term information with “Neoplasms” as the representative term is set in the entry identified by the entry ID “9028”.

[0055] Next, a method for setting MeSH term information in the term DB 192 will be described.

In “Entry ID”, set unique IDs for the number of MeSH terms in the MeSH term information to be stored and managed.

The information stored in the MeSH database is set in “Heading”, “MeSH number”, “Term description”, and “Synonym term”.

“Number of documents per term in all documents” is set to MEDLINE200 in a query using the OR (logical sum) condition for all MeSH terms set to “Heading” and MeSH terms set to “Synonymous terms” in the entry. Set the number of hit documents when accessed.

Next, a method for setting “classification” will be described.

Figures 5, 6, 7, 8, and 9 are MeSH Descritor tree diagrams.

Fig. 5 to Fig. 9 [Koo! MeSH numbers [D05], [D06], [D08] to [D12], [D14. 600], [D17. 350], [D24] and [D27. 505] ] And “G” is set in the term DB192 as the MeSH term (hereinafter referred to as gene classification term) for classifying the lowermost Descriptor term among the lower level Descriptor terms into genes. In addition, “G” is set in the term DB192 with the SCR term mapped to the Descriptor term classified as a gene as the gene classification term.

In addition, the MeSH numbers [C01] to [C04] and [C05] to [C23] and the lower-layer Descriptor terms are classified as “MeSH terms” (hereinafter referred to as “disease classification terms”). "Is set.

In addition, MeSH numbers [D01 to [D04], [D07], [D14. 100], [D15], [D16], [D 17. 025], [D17. 050], [D18], [D19], [D26], [D27. 720] and [D27. 8 88] and the Descriptor term in which RegistryNumber is recorded among the Descriptor terms in each lower layer, “C” is set in the term DB192 as the MeSH term (hereinafter referred to as the compound classification term) that classifies the compound. . Also, it is mapped to the Descrip tor term classified as a compound! /, And RegistryNumber is recorded !, and “C” is set in the term DB192 with the SCR term as the compound classification term.

Also, the MeSH numbers [A01 to [A08], [A10], [Al l. 063 to [Al l. 270], [Al

I. 284. 149], [Al l. 284. 180], [Al l. 284. 295], [Al l. 284. 420], [A

II. 284. 430. 214], [Al l. 284. 430. 429], [Al l. 299 to [Al l. 868], [A12 to [A14], [C04. 182], [C04. 557 435 G [C04. 557. 665] and [C 04. 588] and their lower-layer Descriptor terms are classified into organizations as MeSH terms (hereinafter referred to as organization classification terms) with term “T” in term DB192. Set.

[0057] Hereinafter, MeSH numbers [D05], [D06], [D08] to [D12] in gene classification terms,

[D14. 600], [D17. 350], [D24] and [D27. 505], Shipang, MeSH numbers in classification terms [C01]-[C04] and [C05]-[C23], in compound classification terms M eSH number [D01 G [D04], [D07], [D14. 100], [D15], [D16], [D17. 025], [D17. 050], [D18], [D19], [D26 ], [D27. 720] and [D27. 888], MeSH numbers [A01] to [A08], [A10], [Al l. 063] to [A11.270], [Al l. 149], [Al l. 284. 180], [Al l. 284. 295], [Al l. 284. 420], [Al l. 284. 430. 214], [Al l. 284. 430. 429], [Al l. 299 to [Al 1. 868], [A12 to [A14], [C04. 182], [C04. 557. 435 to [C04. 557. 665] and [C04. 588] Each is a classification index number (specific MeSH number) that is used as an index for classifying MeSH terms.

FIG. 10 is a diagram showing a screen (search result screen) on which the search result display unit 105 in Embodiment 1 displays the search result.

In FIG. 10, (1) is the main menu, (2) is the query, (3) is the submenu, (4) is the result, and (5) is the history.

[0059] The main menu (1) is a button for switching the main function. The main function includes the document search function (Search), synonym search function (Dictionary), alert function (Alert), environment The setting function (Customize), help function (Help), and logout function (Logout) are shown.

When the user presses the button of the main menu (1) using the mouse 903 (an example of an input device) when the user presses the button of the main search (1), the document search unit 100 uses the document search unit 110. Execute the literature search. In addition, if the pressed button is a synonym search function, the document search apparatus 100 executes the synonym search unit 111 to perform a synonym search, and if the pressed button is an alert function or an environment setting function, the user environment Execute the setting unit 101 to set the environment. Further, if the pressed button is the help function, the document search device 100 displays the description information of the document search device 100 on the CRT display device 901 (an example of a display device). It is assumed that the description information of the document search device 100 is stored in a storage device included in the document search device 100. Further, if the pressed button is a logout function, the document search device 100 does not execute subsequent document search processing.

The query (2) is a text box for inputting a search condition expression (query, search term or designated term) for accessing MEDLINE and searching for a document. The document search apparatus 100 stores in the storage device the search condition expression that the user inputs and specified in the query (2) from the keyboard 902 (an example of an input device).

[0061] The submenu (3) is a tab for switching the display method of the document search results. As the display method of the document search results, the extraction term classification display (GDCT), the hit document display (Article), and the extraction term display Show MeSH number display (MeSH)!

The document retrieval apparatus 100 executes the display change unit 103 when the user presses the tab of the sub-user (3) using the mouse 903 (an example of an input device), and displays information (display) of the pressed tab. The display of the document search result is changed according to the designation information.

[0062] The result (4) is a display area showing a document search result by the display method specified in the submenu (3).

[0063] The history (5) is a display area indicating search transition.

Here, the document retrieval apparatus 100 is an example of a document retrieval apparatus that retrieves documents.

MEDLINE 200 is an example of a document database that stores a set of document information (for example, all document information recorded in MEDLINE 200). Term DB 192 is a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues, a hierarchical database that stores hierarchical information (MeSH number) of thesaurus rasters, and synonyms for thesaurus rasters. It is an example of a synonym database that stores. The MEDLINE access unit 116 is an example of a search statement input unit that inputs a search statement from an input device and a document information acquisition unit that acquires document information corresponding to the input search statement.

In addition, the MeSH term extraction unit 121 extracts a document term (for example, MeSH term [including a representative term corresponding to the extraction term) included in the acquired document information (hit document information) and a document term extraction. It is an example of a part.

The classification acquisition unit 123 is an example of a classification acquisition unit that acquires the thesaurus raster classification from the classification database (term DB 192).

The search result display unit 105 is an example of an output unit that outputs an extraction term (including a representative term and MeSH number corresponding to the extraction term), a classification, a term score, and a search transition to an output device.

Further, the term score calculation unit 124 obtains the acquired document information including the thesaurus raster in the number of acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information. 5 is an example of a time score calculation unit that calculates a score of the ratio of the number of

In addition, the search transition information generation unit 126 calculates the number of duplicate terms and the number of duplicate terms that calculate the number of the same literature terms as the past literature terms among the literature terms (for example, MeSH terms) extracted this time. It is an example of a part.

The display changing unit 103 also corresponds to the hierarchy input by the specified hierarchy input unit for inputting the specification of the thesaurus raster hierarchy, the thesaurus hierarchy acquisition unit for acquiring the hierarchy information of the thesaurus raster, and the specified hierarchy input unit. 5 is an example of a thesaurus raster selection unit that selects a thesaurus raster based on hierarchical information acquired by a thesaurus hierarchy acquisition unit.

In addition, the document search unit 110 performs a narrow sense search using a thesaurus raster included in the search sentence as a search condition. An example of a search method input unit for inputting a search method specification of a search method and a broad search method using a thesaurus raster including the search statement and a thesaurus raster below the thesaurus raster included in the search statement as search conditions It is. Here, the environment setting information included in the user information includes setting information for selecting either a narrow search or a broad search (referred to below).

The document search unit 110 inputs the search method information from the user information DB 191.

The representative term acquisition unit 113 is an example of a synonym term acquisition unit that acquires a thesaurus raster having a synonym as a search term and also displays the synonym database power on a display device. The designated term acquisition unit 114 is an example of a search term designation unit that designates a thesaurus raster that uses the displayed thesaurus raster power as a search term.

FIG. 11 is a diagram illustrating an appearance of the document search device 100 according to the first embodiment.

In FIG. 11, the document retrieval device 100 includes a system unit 910, a CRT (Cathode Ray Tube) display device 901, a keyboard (KZB) 902, a mouse 903, a compact disc device (CDD) 905, a printer device 906, and a scanner device 907. These are connected by cables.

Further, the document retrieval apparatus 100 is connected to a fax machine 932 and a telephone 931 via a cable, and is connected to the Internet 940 via a local area network (LAN) 942 and a web server 941. The document retrieval apparatus 100 is connected to the MEDLINE database server 200 (hereinafter referred to as MEDLINE200), the OM IM database server 210 (hereinafter referred to as OMIM210), the MeSH database server 220 (hereinafter referred to as MeSH220) via the LAN 942 or the Internet 940. ).

FIG. 12 is a hardware configuration diagram of the document search device 100 according to the first embodiment.

In FIG. 12, the document retrieval apparatus 100 includes a CPU (Central Processing Unit: 911) for executing a program 911. CPU911 via ROM 912 ROM913, RAM914, communication board 915, CRT display device 901, K ZB902, mouse 903, FDD (Flexible Disk Drive) 904, magnetic disk device 920, CDD905, printer device 906, scanner device 907 Connected with. The CPU 911 executes the operations of “to part” described below. In other words, CPU911 It performs operations such as output, communication and display.

The RAM 914 is an example of a volatile memory. ROM913, FDD904, CDD905, magnetic disk device 920, and optical disk device are examples of nonvolatile memory. These are examples of storage devices, storage devices, or storage units. User information DB191, term DB19

2, search result storage unit 193, MEDLINE200, OMIM210, MeSH220,

, User information, MeSH term information, MEDLINE result information · Search result information, MEDLI

NE information, OMIM information 'MeSH information is stored in the storage device. However, MEDLINE200

The OMIM210 and MeSH220 databases may be external databases connected to the Internet 940. The communication board 915 is connected to a fax machine 932, a telephone 931, a LAN 942, and the like. For example, the communication board 915, KZB902, scanner device 907, FDD904, storage device, and the like are examples of an input device, an input device, or an input unit. The user environment setting unit 101, the MEDLINE update information acquisition unit 102, the display change unit 103, and the login unit 104 respectively input environment setting information, update information, display designation information, and login information from the input device. The synonym search unit 111 also inputs a search term and a designated term as input device power. Further, for example, the communication board 915, the CRT display device 901, the storage device, and the like are examples of an output device, an output device, or an output unit. The synonym search unit 111 and the search result display unit 105 output the representative term and the search result to the output device, respectively.

Here, the communication board 915 is not limited to the LAN 942, and may be directly connected to the Internet 940 or a WAN (Wide Area Network) such as ISDN. When directly connected to a WAN such as the Internet 940 or ISDN, the document retrieval apparatus 100 is connected to a WAN such as the Internet 940 or ISDN, and the web server 941 is unnecessary.

The magnetic disk device 920 stores an operating system (OS) 921, a window system 922, a program group 923, and a file group 924. The program group 923 is executed by the CPU 911, the OS 921, and the window system 922.

The program group 923 stores programs that execute the functions described as “˜unit”, “˜DB”, and “˜storage unit” in the following description of the embodiments. Professional The gram is read and executed by the CPU 911.

In the file group 924, in the description of the embodiment described below, “determined to”, “result of determining to”, “calculated to”, “calculated result of”, and “to” are processed. The data of the result explained in expressions such as “and“ results of processing ”and the data explained in expressions such as“ ˜information ”are stored as“ ˜files ”. In addition, information such as buttons, tabs, list boxes, and text boxes designated by the user on the screen is stored as “˜file”.

In addition, the arrow portion of the flowchart described in the description of the embodiment described below mainly indicates input / output of data, and for the input / output of the data, the data is the magnetic disk device 920, FD (Flexible Disk Cartridge). , Optical disc, CD (compact disc), MD (mini disc), DVD (Digital Versatile Disc), and other recording media. Alternatively, it is transmitted through a signal line or other transmission medium.

In addition, what is described as “˜unit”, “˜DB”, and “˜storage unit” in the following description of the embodiment is realized by firmware stored in the ROM 913. I do not care. Alternatively, it may be implemented by software alone, hardware alone, a combination of software and hardware, or a combination of firmware.

[0070] A program for carrying out the embodiment described below may be stored by using a recording device using a magnetic disk device 920, FD, an optical disk, a CD, an MD, a DVD, or other recording media.

FIG. 13 is a flowchart showing a document search process performed by the document search apparatus 100 according to the first embodiment.

An overview of the document search process of the document search apparatus 100 according to Embodiment 1 will be described below with reference to FIG.

[0072] First, the login unit 104 performs a login process to the user's document search apparatus 100 (S10 Do)

Next, the document search unit 110 performs a document search process for accessing the MEDLINE 200 and acquiring MEDLINE search information in accordance with a user instruction (S102). Next, the search result generation unit 120 performs search result generation processing for generating a search result based on the MEDLINE search information acquired by the document search unit 110 (S103).

Then, the search result display unit 105 performs a search result display process for displaying the search result on the CRT display device 901 (an example of a display device) based on the search result information generated by the search result generation unit 120 (S 104). .

Thereafter, the document search unit 110 repeats S102 to S104 according to the user's instruction, and supports the narrow search.

After the refinement search is completed, the user can acquire the refined useful literature information by specifying the hit literature display (Article) in the submenu (3) of the search result screen. At this time, the display change unit 103 acquires the MEDLINE result information from the search result storage unit 193, and the search result display unit 105 displays the hit document information indicated by the MEDLINE result information as the narrowed down useful document information. Display on the device.

Next, details of the document search processing (S101 to S104) of the document search apparatus 100 will be described below.

First, the login process (S101) of the login unit 104 will be described.

In the login process (S101), the login unit 104 displays a text box for entering the login ID and password on the CRT display device 901 (an example of a display device), and prompts the user to enter the mouth ID and password. Then, the login unit 104 inputs the login ID and password designated by the user also inputting the keyboard 902 (an example of an input device). The login unit 104 outputs the input login ID and accesses the user information DB 191. The user information DB 191 outputs the search result of the user information using the login ID output from the login unit 104 as a search key. The user information DB 191 stores a registered login ID of a registered user as user information, and stores a registration password, environment setting information of the user, etc. in a storage device in association with the login ID. Then, the user information DB 191 searches for user information whose registered login ID matches the search key, and determines whether there is user information whose registered login ID matches the search key and user information whose registered login ID matches the search key. Output as search results. When the search result output from the user information DB 191 indicates “the user information is present”, the login unit 104 displays the entered user-specified password. The password is compared with the password in the user information included in the search result. Then, if the entered user-specified password matches the password in the user information included in the search results, the user's access to the document search device 100 is permitted, and the documents based on the subsequent user's specifications Perform search processing. In addition, the login unit 104 does not match the search result output from the user information DB 191 indicating that “the user information is not present” or the input password specified by the user and the password in the user information included in the search result. If the user does not permit access to the document search device 100, the subsequent document search process is not executed.

Next, the document search process (S102) of the document search unit 110 will be described.

In the document search process (S102), the login unit 104 permits the user to access the document search device 100, and the user who is permitted to access the document search function (1) in the search result screen (1) When “Search” is specified, the MEDLIN E access unit 116 of the document search unit 110 also inputs a query specified by the user with input device capabilities. For example, a user inputs a query entered in the query (2) on the search result screen from the keyboard 902 (an example of an input device). For example, a file in which a query specified by the user is set is stored in a storage device (input Enter the query from the file obtained from the device example). Then, MEDLINE access unit 116 accesses MEDLINE 200 with the input query, acquires MEDLINE result information from MEDLINE 200, and stores the acquired MEDLINE result information in search result storage unit 193.

Next, the search result generation process (S 103) of the search result generation unit 120 will be described.

First, the MeSH term extraction unit 121 of the search result generation unit 120 performs MeSH term extraction processing for extracting the MeSH term from the hit document information included in the MEDLINE result information acquired by the document search unit 110.

Then, the additional information generation unit 122 of the search result generation unit 120 performs additional information generation processing for generating additional information for each MeSH term extracted by the MeSH term extraction unit 121.

FIG. 14 is a flowchart showing a flow of MeSH term extraction processing of the MeSH term extraction unit 121 in the first embodiment.

MeSH term extraction unit in search result generation processing (S 103) of search result generation unit 120 The 121 MeSH term extraction process is described below with reference to FIG.

First, the MeSH term extraction unit 121 acquires hit document information from the search result storage unit 193 (S201).

Next, the MeSH term extraction unit 121 determines whether or not the acquired all hit document information has been processed in order to perform the processing of S203 to S205 for each acquired hit document information (S202).

) o

When there is unprocessed hit document information, the MeSH term extraction unit 121 extracts the target hit document information (i.e., the unprocessed hit document information) force MeSH term as an extraction term (S203).

Next, since the MeSH term extraction unit 121 performs the process of S205 for each extracted term, the extracted total extracted terms are subjected to the processed power determination (S204).

[0079] When there is an unprocessed extraction term, the MeSH term extraction unit 121 adds a representative term corresponding to the extraction term to be processed (an unprocessed extraction term! /, Or a deviation) and adds an extraction term. Generate a list (list information of representative terms corresponding to the extracted terms). In addition, the number of appearances of the representative terms corresponding to the extracted terms is set in the extracted term list. The number of appearances set in this extracted term list indicates the number of documents related to the MeSH term among the hit documents (hereinafter referred to as the number of documents per term in the hit document). Therefore, the MeSH term extraction unit 121 outputs the extraction term to be processed and accesses the term DB 192. As shown in Figure 4, the term DB192 that stores MeSH term information uses the extracted term output from the MeSH term extraction unit 121 as a search key, and the MeSH term information (entry) whose representative term or synonymous term matches the search key. Searches and outputs MeSH term information whose representative term or synonymous term matches the search key as a search result. The MeSH term extraction unit 121 compares the representative term indicated by the MeSH term information output by the term DB 192 (hereinafter referred to as the representative term of the processing target) with the representative term in the generated extracted term list, and represents the representative of the processing target. The power is determined if the same representative term as the term already exists in the extracted term list. If the same representative term as the processing target does not exist in the extracted term list, the MeSH term extraction unit 121 adds the representative term to be processed to the extracted term list and associates it with the added representative term of the processing target. Set the number of documents per medium term “1”. Also If there is a representative term that is the same as the processing target, the MeSH term extraction unit 121 does not add the processing target representative term to the extraction term list, and stores hit documents stored in association with the same representative term as the processing target. Increment the number of documents per medium term. In addition, when the term DB192 2 outputs multiple MeSH term information, the MeSH term extraction unit 121 determines the number of documents per term in the reference term and hit for the representative term indicated by each MeSH term information. And are set in the extraction term list. Then, S204 is processed again (S205).

In S204, when there is no unprocessed extraction term (all extracted extraction terms have been processed), the MeSH term extraction unit 121 processes S202 again.

In S202, if there is no unprocessed hit document information (all acquired hit document information has been processed), the MeSH term extraction unit 121 ends the MeSH term extraction process.

FIG. 15 is a flowchart showing a flow of additional information generation processing of the additional information generation unit 122 in the first embodiment.

The additional information generation processing of the additional information generation unit 122 in the search result generation processing (S 103) of the search result generation unit 120 will be described below based on FIG.

[0081] The additional information generating unit 122 processes all the representative terms set in the extracted term list in order to perform the processing of S302 to S304 for all the representative terms set in the extracted term list generated by the MeSH term extracting unit 121. It is determined whether or not (S301).

If there is an unprocessed representative term, the additional information generation unit 122 accesses the term DB 192 by outputting a representative term to be processed (any of the unprocessed representative terms). As shown in FIG. 4, the term DB 192 that stores MeSH term information searches the MeSH term information (entry) for which the representative term matches the search key, using the representative term output from the additional information generating unit 122 as a search key. , MeSH term information whose representative term matches the search key is output as the search result. Then, the additional information generating unit 122 sets the description of the term indicated by the MeSH term information output from the term DB 192 in the extracted term list in association with the representative term to be processed (S302).

Next, the classification acquisition unit 123 of the additional information generation unit 122 performs MeS output from the term DB 192. A classification acquisition process is performed in which the classification indicated by the H-term information (an example of additional information) is set in the extracted term list in association with the representative term to be processed (S303).

The term score calculation unit 124 of the additional information generation unit 122 calculates a term score of the representative term to be processed, and extracts the calculated term score (an example of additional information) in association with the representative term of the processing target. A term score calculation process set in the term list is performed (S304).

If there is no unprocessed representative term in S301, the additional information generation unit 122 stores the extracted term list in which the additional information is set in the search result storage unit 193 as search result information (S305).

FIG. 16 is a flowchart showing the flow of term score calculation processing of the term score calculation unit 124 in the first embodiment.

Details of the term score calculation process (S304) of the term score calculation unit 124 will be described below with reference to FIG.

First, the term score calculation unit 124 calculates the term score S (t) of the representative term t to be processed using the following formula (S401).

[0084] [Equation 1]

S) = log ₂ = log ₂ Η (ή-log ₂ Μ (ή

Μ (ή =-

! -¾

Number of documents per term in hit documents

Ρ = number of hit documents

Number of documents per term in all documents

'' Total number of documents in MEDLINE

The total number of MEDLINE documents and the number of hit documents used in the above formula also acquire the MEDLINE result information power stored in the search result storage unit 193. The number of documents per term in all documents is obtained from MeSH term information output by the term DB 192 in S302. Also hit The number of documents for each term in the document is obtained from the extracted term list.

In the above formula, the odds are calculated using the density (pt) of the representative terms to be processed in the hit literature information as the standard distribution and the density (qt) of the representative terms to be processed in all the literature information as the measured distribution. The ratio (Lod [Log Of Odds]) is calculated as the term score. However, “odds” is an example of scoring, and the term score may be calculated by other scoring.

Therefore, the lower the appearance frequency of the MeSH term in all the MEDLINE200 document information and the higher the appearance frequency of the MeSH term in the hit document information, the higher the term score of the MeSH term. In other words, the higher the term score, the more important the MeSH term is.

[0086] Next, the term score calculation unit 124 of the additional information generation unit 122 adjusts the provisionally calculated term score (S402 to S404).

In S402, the number of hit documents is compared with a specific first threshold value, and in S403, the number of documents per term in the hit documents of the representative term to be processed is compared with a specific second threshold value. In S404, the term score of the representative term of the processing target whose number of hit documents is less than the first threshold and the number of documents per term in the reference document is less than the second threshold is set to “one”. Change to “∞”. Here, it is assumed that the first threshold for the number of hit documents and the second threshold for the number of documents for each term in the hit documents are preliminarily stored in the storage device of the document search apparatus 100. A plurality of first threshold values and second threshold values may be provided, and the term score may be calculated according to a plurality of conditions. For example, a third threshold is set for the number of documents per term in hit documents, and in addition to the above conditions, the number of documents per term in the hit documents of the representative term to be processed when the number of hit documents is greater than or equal to the first threshold If is less than the third threshold, the term score of the representative term to be processed may be changed to “one ∞”. The term core adjustment is not limited to setting the term score to “one ∞” when the above conditions are met. For example, the term score when the above conditions are met can be reduced or subtracted from a specific value, or the term score can be increased at a certain rate when the above conditions are not met. Or add a specific value.

[0088] The term score calculated in S401 is high because the number of hit documents is small. A score does not necessarily indicate that the MeSH term is important. In particular, the MeSH term, which has a small number of hit documents and a small number of documents in each hit document, is not considered important. Therefore, in S402 to S404, the MeS considered to be unimportant

Adjust to lower the H term term score.

Then, the term score calculation unit 124 sets the term score in the extracted term list in association with the representative term to be processed (S405).

Next, the search result display process (S 104) of the search result display unit 105 will be described.

The search result display unit 105 extracts the term list (search result information from the search result storage unit 193).

) And display the result (4) on the search result screen as shown in FIG.

FIG. 17 is a diagram showing a result (4) of the search result screen according to the first embodiment.

The result (4) of the search result screen when the user specifies the extracted term classification display (GDCT) in the submenu (3) of the search result screen will be described below with reference to FIG.

[0092] In Fig. 17, [1] is a check box, [2] is a term score (Score), [3] is the number of documents in each hit document (Frequency), [4] is a representative term (Heading), [5] is a description, [6] is an OMIM icon, and [7] is a PubMed icon.

[0093] The search result display unit 105 displays the above items ([1] to [7]) for each extracted term according to the classification of each extracted term extracted by the classification obtaining unit 123 and set in the extraction list. The Fig. 17 shows the display of extraction terms classified into genes (Gene).

[0094] The check box [1] is a check box for selecting a MeSH term to be added to the tale as a search condition for the narrow search.

The query generation unit 115 performs an AND (logical product) condition, an OR (logical sum) condition, or a NOT (negative) condition on the representative term [4] checked by the user using a mouse 903 (an example of an input device). Generate the query added in.

[0095] The term score (Score) [2] indicates the term score, the number of documents per term in the hit document (Freq uency) [3] indicates the number of documents per term in the hit document, and the representative term (Heading) [4] Indicates the representative term, and Description [5] describes the representative term [4].

[0096] The OMIM icon [6] is an OMI that uses the MeSH term (representative term [4]) as a search condition. The icon that designates the display of the search result page of M. PubMed icon [7] is the icon that designates the display of the search result page of PubMed that uses the Me SH term (representative term [4]) as a search condition.

When the user presses the OMIM icon [6] or PubMed icon [7] with the mouse 903 (an example of an input device) using the mouse 903, the query generation unit 115 responds to the current query used to access MEDLINE. Generate a query that is added by AND condition. Next, the MEDLINE access unit 116 generates a URL (including a query) for accessing the search page (OMIM or PubMed) corresponding to the pressed icon, accesses it via the Internet, and generates the query generation unit 115. The search result page information corresponding to the query is obtained via the Internet. Based on the information acquired by the MEDLINE access unit 116, the search result display unit 105 displays a search result page provided by OMIM or PubMed on the Internet.

[0097] In the result (4) of the search result screen, the user uses the mouse 903 (an example of an input device) to obtain a term score [2] (Score), the number of documents per term in the hit document [3] ( When the name of each item of (Frequency) and representative term [4] (Heading) is pressed, the display changing unit 103 displays information on the pressed items (term score, number of documents per term in hit documents, representative term) (display designation) The search result information is sorted according to the information), and the search result display unit 105 changes the display of the search result based on the sorted search result information (display change information).

[0098] In Embodiment 1 above, it is possible to output additional information (classification, term score) together with extraction terms (which also means representative terms corresponding to extraction terms) as information for narrowing down literature information. explained.

[0099] In addition, the extraction terms can be added to the search condition by adding the additional information to display the extraction terms and specifying the check box [1] in the result (4) on the search result screen. I explained what I can do. In other words, it was explained that the search condition can be easily added to narrow down the literature information.

[0100] Furthermore, it has been explained that the MeSH terms are classified according to the classification index number (specific MeSH number).

[0101] In addition, the frequency of occurrence of the MeSH term in all MEDLINE200 document information is low. In addition, we explained that the higher the frequency of occurrence of the MeSH term in hit literature information, the higher the term score.

[0102] Furthermore, it has been explained that the MeSH term has a low term score in the hit document with a small number of hit documents and the number of documents per term is small.

[0103] In Embodiment 1 above, the MeSH term extraction unit 121 generates an extraction term list indicating the extraction terms themselves, not the representative terms corresponding to the extraction terms, and the additional information generation unit 122 extracts the extraction terms themselves. Additional information may be generated based on the term list. That is, the extraction term itself and its additional information may be output. However, since the representative term uniquely identifies the MeSH term, the representative term corresponding to the extracted term and its additional information are output, so that a more appropriate MeSH term can be narrowed down to the user as a search condition. Can be specified.

[0104] Also, in the first embodiment, the document information may not be the document information recorded in MEDLINE 200, and the document information may not be the information of the medical-related document. .

[0105] In the first embodiment, the document term may not be a MeSH term, and the document term may not be information on medical terms! /.

[0106] Further, the classification described in the first embodiment may be designated as a search condition for the category. In other words, by specifying the classification and specifying the MeSH term as a search condition, even if there are multiple representative terms that have the specified MeSH term as a synonym term, only the representative terms of a specific classification are used as the search condition. be able to. This means that the user can specify a more appropriate MeSH term as a search condition for a refined search. In this case, the MeSH terms included in the specified query are representative terms or synonymous terms, and the query generation unit 115 acquires the representative terms of the specific classification specified in the query from the term DB 192, The query is generated by replacing the obtained representative term with the MeSH term.

[0107] In the first embodiment, the "term description" setting (S302) and the "classification" setting (S303) in the extracted term list are performed when the extracted term list is generated (S205). Well ... This reduces access to the term DB192 and reduces processing time.

[0108] As described above, according to the first embodiment, the limitation for acquiring useful document information is described. We explained that it is possible to assist in-line literature search.

[0109] Embodiment 2.

As information for narrowing down the literature information, the following explains how to output the classification and term score along with the MeSH number for MeSH terms included in the hit literature information. Here, the parts different from the first embodiment will be described, and the other parts shall be the same as those of the first embodiment.

[0110] In S303 in the document search process described in the first embodiment, the MeSH number acquisition unit of the additional information generation unit 122 performs the MeSH number acquisition process and adds the MeSH number corresponding to the representative term of the extracted term. The search result is output as information. In the MeSH number acquisition process, the MeSH number (example of additional information) indicated by the MeSH term information output by the term DB 192 is set in the extracted term list in association with the representative term to be processed. Other processes and configurations of the document retrieval apparatus 100 in the second embodiment are the same as those in the first embodiment.

FIG. 18 is a diagram showing a result (4) of the search result screen in the second embodiment.

The result (4) of the search result screen when the extraction term MeSH number display (MeSH) is specified as the user in the submenu (3) of the search result screen is described below with reference to FIG.

[0112] In FIG. 18, [1] is a list box for selecting a hierarchy of MeSH terms to be displayed, [2] is a MeSH number, and [3] is a display of MeSH search result page Me SH Icon. Other screen configurations are the same as those in the first embodiment.

[0113] In FIG. 18, the display changing unit 103 uses the mouse 903 (an example of an input device) to specify the MeSH term hierarchy to be displayed in the list box [1]. Outputs the extraction term list (search result information) for the extraction terms corresponding to (display change information). The search result display unit 105 changes the display of the search result based on the search result information (display change information) output from the display change unit 103.

At this time, when the MeSH number is specified in the list box, the display changing unit 103 compares the MeSH number of each extraction term set in the extraction term list with the specified MeSH number by the forward match, and the forward match is obtained. Output a list of extracted terms by extraction terms . The extracted terms with the specified MeSH number that matches the prefix indicate the extracted terms in the specified hierarchy and lower layers. In addition, the display change unit 103 may output an extraction term list for extraction terms that completely match the specified MeSH number. The extracted terms that exactly match the specified MeSH number indicate only the extracted terms in the specified hierarchy.

When the MeSH term is specified in the list box, the display change unit 103 acquires the MeSH number of the specified MeSH term from the term DB192 or the extracted term list, and the MeSH number is specified in the list box. The extracted term list corresponding to the specified hierarchy is output in the same way as.

[0114] Further, when the user presses the MeSH icon [3] using the mouse 903 (an example of an input device), the MEDLINE access unit 116 uses the MeSH number [2] as a query (including the query). ), Access the search page of the MeSH database on the Internet, and obtain the search result page information. Then, the search result display unit 105 displays a search result page provided by the MeSH database on the Internet based on the information acquired by the MEDLINE access unit 116! /.

[0115] In the second embodiment, it has been described that MeSH terms in a specified hierarchy among MeSH terms included in hit document information can be output as information for narrowing down medical document information. As a result, it is possible to support a narrowed-down document search for obtaining useful document information.

[0116] In addition, in Embodiment 2 described above, it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported.

[0117] Embodiment 3.

The following describes how the number of the same document terms extracted in the past among the document terms extracted this time is output as search transition information. Here, parts different from those of the first embodiment will be described, and the other parts shall be the same as those of the first embodiment.

[0118] In S103 in the document search process described in the first embodiment, the search transition information generation unit 126 performs a search transition information generation process and generates a search result for the search transition. The

In S104 in the document search process, the search result display unit 105 acquires the search transition information (search result information) generated by the search transition information generation unit 126 from the search result storage unit 193, and performs the search described below. Display the result screen history (5).

In addition, the search result generation unit 120 stores the generated search result information in the search result storage unit 193 in association with the MEDLINE result information, the search execution date and time, and the query used for the search.

Other processes and configuration of the document retrieval apparatus 100 in the third embodiment are the same as those in the first embodiment.

[0119] In the following explanation, the result information is combined with MEDLINE result information and search result information.

[0120] The search result generation process (S103) of the search result generation unit 120 in the third embodiment will be described.

First, as in the first embodiment, the MeSH term extraction unit 121 performs MeSH term extraction processing.

Then, the search transition information generation unit 126 of the search result generation unit 120 performs search transition information generation processing for generating search transition information based on the current result information and past result information.

FIG. 19 is a flowchart showing the flow of search transition information generation processing of the search transition information generation unit 126 in the third embodiment.

The search transition information generation process of the search transition information generation unit 126 in the search result generation process (S103) of the search result generation unit 120 will be described below with reference to FIG.

First, the search transition information generation unit 126 acquires past (previous) result information from the search result storage unit 193. Also, the current MEDLINE result information is acquired (S501).

Next, the search transition information generation unit 126 determines whether or not the current all hit document information has been processed in order to perform the process of S503 for the current all hit document information (included in the MEDLINE result information) (S502).

If there is unprocessed current hit document information, the current hit document information to be processed (unprocessed) The hit document information of the process is compared with the past hit document information, and the current hit document information to be processed that overlaps the past hit document information of the misalignment force is added to generate a duplicate document list . Then, S502 is processed again (S503).

If there is no unprocessed hit document information of this time (all hit document information of this time has been processed), the search transition information generator 126 supports the extracted terms indicated by the extracted term list included in the search result information In order to perform the processing of S505 for all the representative terms of this time, the processing power of all the representative terms of this time is judged (S504).

If there is an unprocessed representative term, the current representative term to be processed (the power of any unprocessed representative term) is compared with the representative term corresponding to the past extracted term. Add a representative term for the current process that overlaps with a term to generate a duplicate term list. Then, S504 is processed again (S505).

If there is no unprocessed representative term (all current representative terms have been processed), the search transition information generation unit 126 searches the generated duplicate document list and duplicate term list as search result information. Store in the result storage unit 193. At this time, the search transition information generation unit 126 stores the search result information in association with the MEDLINE result information, the search execution date and time, and the query used for the search (S506).

20 and 21 are conceptual diagrams of search transition information in the third embodiment.

As a result of the above-described search transition information generation processing (S501 to S506), the search transition information generation unit 126 generates the current result information (result information 1) and the previous result information (result information 2) as shown in FIG. Duplicate information (hit documents, representative terms corresponding to extracted terms) is generated as search transition information.

Further, as shown in FIG. 21, the search transition information generation unit 126 searches for information that overlaps between the current result information (result information 1) and a plurality of past result information (result information 2, result information 3). It may be generated as transition information.

In this case, the search transition information generation unit 126 treats the previously generated search transition information as past result information in the search transition information generation process (S501 to S506), and generates the current search transition information. .

FIG. 22 is a diagram showing a history (5) of the search result screen in the third embodiment. In Fig. 22, [1] is the clear button, [2] is the history number (No), [3] is the execution date (Time) ゝ [4] is the query (Query) ゝ [5] is the number of hit documents (Article — Num), [6] is the number of overlapping documents (Article—Overlap), [7] is the number of extracted terms (Term—Num), and [8] is the number of overlapping terms (Term-Overlap).

[0125] The clear button [1] is a button for designating deletion of past result information.

When the user presses the clear button using the mouse 903 (an example of an input device), the document search apparatus 100 deletes past result information stored in the search result storage unit 193.

[0126] History number [2] indicates a number for identifying current and past searches.

The search result display unit 105 adds history numbers in ascending order from the latest execution date and time.

[0127] Execution date and time [3] indicates the search execution date and time, query [4] indicates the query used in each search, number of hit documents [5] indicates the number of hit documents in each search, and overlapping documents The number [6] indicates the number of documents in the duplicate document list, the number of extracted terms [7] indicates the number of representative terms corresponding to the extracted terms, and the number of overlap terms [8] indicates the number of representative terms in the duplicate term list .

[0128] In the third embodiment, the number of the same document terms (or documents) as the document terms (or documents) extracted in the past among the currently extracted document terms (or documents) is output as the search transition information. I explained what I can do. In other words, if the number of document terms (or documents) indicated by the search transition information is extremely small compared to the number of document terms (or documents) extracted this time, a narrowed-down document search is performed with an incorrect search condition. If the number of document terms (or documents) indicated by the search transition information is reasonably small relative to the number of document terms (or documents) extracted this time, a refined document search is performed using effective search conditions. It is possible to make the user recognize the status of search transitions such as thought to have been lost. Thus, the narrowed-down document search for obtaining useful document information can be supported by the third embodiment.

[0129] In addition, in Embodiment 3 described above, it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported. [0130] Embodiment 4.

A mode in which a representative term having the term specified by the user as a synonym is acquired, and literature information is searched with a reference using the specified representative term as a search condition among the acquired representative terms will be described below. Here, the parts different from the respective embodiments will be described, and the other parts shall be the same as those of the respective embodiments.

FIG. 23 is a flowchart showing the flow of document search processing of document search unit 110 in the fourth embodiment. The document search process (S102) of the search result generation unit 120 when the synonym search function (Dictionary) is designated by the user in the main menu (1) of the search result screen will be described below with reference to FIG.

[0132] In the document search process (S102), first, the synonym search unit 111 outputs a representative term corresponding to the term specified by the user (search term) and is used for the search (specified term). A synonym search process is performed to allow the user to specify (S601 to S603), and then the query generation unit 115 performs a query generation process to generate a query using the specified term as an AND condition or an OR condition (S604), and Then, the MEDLINE access unit 116 performs MEDLINE access processing for accessing the MEDLINE 200 using the query generated by the query generation unit 115 and acquiring hit document information (S605).

Next, details of the synonym search processing (S601 to S603) of the synonym search unit 111 will be described.

In S601, the search term input unit 112 of the synonym search unit 111 also inputs the input device power using the term specified by the user as the search term. For example, a user inputs an arbitrary term from the keyboard 90 2 (an example of an input device), and for example, a file in which a term specified by the user is set is obtained from a storage device (an example of an input device) Enter a custom term. The number of terms can be one or more.

In S602, the representative term acquisition unit 113 of the synonym search unit 111 accesses the term DB 192 with the search term input by the search term input unit 112, and acquires a representative term corresponding to the search term. At this time, the term DB 192 searches for MeSH term information (entry) in which the search term and the representative term or the synonym term match the search term, and the search term. The representative term of the entry in which the search term and the synonymous term or the synonymous term match the search term is output to the representative term acquisition unit 113.

In S603, the designated term acquisition unit 114 of the synonym search unit 111 outputs the representative term acquired by the representative term acquisition unit 113 to the search result display unit 105, and the search result display unit 10

5 displays the result (4) of the search result screen as shown in FIG. 24, for example.

FIG. 24 is a diagram showing a result (4) of the search result screen in the fourth embodiment.

In FIG. 24, [1] is a search term (Input), [2] is a check box, and [3] is a representative term (Heading).

[0135] Search term [1] indicates a search term, and representative term [3] indicates a representative term.

[0136] Check box [2] is a check box for selecting a MeSH term (specified term) as a search condition.

The specified term acquisition unit 114 acquires the representative term [3] checked by the user in the check box using the mouse 903 (an example of an input device) as the specified term.

[0137] In Embodiment 4 described above, it is possible to acquire a representative term having a user-specified term as a synonymous term, and to search for literature information corresponding to the representative term specified by the user among the acquired representative terms. explained. Since multiple synonymous terms are commonly used as MeSH terms, it is not always possible to acquire all necessary document information using search terms specified by the user. That is, according to the fourth embodiment, it is possible to allow the user to specify a more appropriate Me SH term as a search condition for a narrow search, and it is possible to support a narrowed document search for acquiring useful document information. This is particularly effective when you want to search for literature information using a large number of terms as search conditions, such as when you want to examine literature information for many genes expressed by a microarray.

[0138] In addition, in Embodiment 4 described above, it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported.

[0139] Embodiment 5.

The form which searches literature information by a broad search system is demonstrated below. Here, parts different from the above embodiments will be described, and other parts will be described. It is the same as the embodiment.

[0140] Here, the broad search method is a search method that uses a MeSH term included in a search sentence and a MeSH term in a lower layer of the MeSH term included in the search sentence as search conditions. The narrow-sense search method is a search method that uses MeSH terms included in the search text as search conditions.

For example, the search method is stored in the user information DB 191 as user information. Then, when accessing the MEDLINE 200, the document search unit 110 performs the following processing according to the search method stored in the user information DB 191.

[0141] When the broad search method is specified, for the query that the MEDLINE access unit 116 accesses MEDLINE 200 in each of the above embodiments, the query generation unit 115 uses the MeSH term in the lower layer of the MeSH term as a condition. Generate the added query. The MEDLINE access unit 116 accesses the MEDLINE 200 with a query in which the MeSH term in the lower layer of the MeSH term generated by the query generation unit 115 is added as a condition, and acquires MEDLINE result information.

[0142] In this case, the query generation unit 115 outputs each term in the query and accesses the term DB 192. The term DB 192 searches for the MeSH term number of the MeSH term that matches each term output by the query generation unit 115. Then, the term DB 192 searches for the MeSH term number that matches the MeSH term number of the MeSH term that matches each term output from the query generation unit 115 and matches the MeSH term of the MeSH number that matches the prefix to the query generation unit 115. Output. Then, the query generation unit 115 generates a query in which the MeSH term output from the term DB 192 is set with an OR condition. The MEDLINE access unit 116 accesses the MEDLINE 200 with one query generated by the query generation unit 115 and acquires MED LINE result information.

[0143] In the fifth embodiment, it is possible to search for literature information using a search method specified between narrow search and broad search, and to output MeSH terms included in hit literature information as information for narrowing down medical literature information. Explained. By specifying the broad search, it is possible to obtain document information related to the terms below the search terms specified by the user. In other words, since it is possible to comprehensively acquire document information related to a search term designated by the user, it is possible to support a narrowed-down document search for acquiring useful document information. [0144] In addition, in Embodiment 5 described above, it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported.

[0145] Embodiment 6.

The form at the time of updating the literature information recorded in MEDLINE 200 will be described below. Here, the parts different from the respective embodiments will be described, and the other parts shall be the same as those of the respective embodiments.

[0146] The MEDLINE update information acquisition unit 102 periodically accesses the NLM FTP site to acquire MEDLINE200 update information, and based on the update information, all documents of each MeSH term information (entry) in the term DB192 Update the number of documents for each medium term. At this time, the MeSH term extraction unit 121 treats the document information indicated by the update information as a hit document, and generates an extracted term list in the same manner as in the first embodiment. Then, the MEDLINE update information acquisition unit 102 outputs the representative terms set in the extracted term list and the number of documents in each hit document to the term DB 192. The term DB 192 is the same as the representative term output by the MEDLINE update information acquisition unit 102. The number of documents per term in the hit document output from the MEDLINE update information acquisition unit 102 is added to the number of documents per term in all documents stored in association with the representative term. As a result, the MEDLINE update information acquisition unit 102 updates the number of documents per term in all documents in each MeSH term information (entry) of the term DB 192 based on the update information.

[0147] Further, the MEDLINE update information acquisition unit 102 notifies the user about the update of the MEDLINE 200 document information. At this time, the user information DB 191 stores the user's mail address as user information, and the MEDLINE update information acquisition unit 102 acquires the mail address from the user information DB 191 and sends it to the mail address acquired. Send an email notifying you of updates to LINE200 bibliographic information. Furthermore, the query determined by the user may be used as user information, and the MEDLINE update information acquisition unit 102 may update the MEDLINE200 to access the MEDLINE200 by querying the user information and notify the user of the search result.

[0148] In addition, email addresses other than the MEDLINE200 bibliographic information notification destination and emails Designation of notification method in user, user-defined query, threshold for adjusting term score explained in the first embodiment, MeSH set in the list box explained in the second embodiment (Fig. 18 [1]) It is desirable to be able to change user information, such as specifying the term hierarchy, the search method of the broad search and narrow search described in the fifth embodiment, and changing the user password.

For example, the user environment setting unit 101 sets an environment setting such as an e-mail address, user-defined query, term score threshold, MeSH term hierarchy list, and search method on the CRT display device 901 (an example of the display device display device). Display the check box and text box for specifying information and password, and update the user information in the user information DB 191 with the information specified by the user! /.

[0149] In Embodiment 6 described above, it has been explained that it is possible to cope with the update of the MEDLINE200 document information. That is, even if new document information is added to MEDLINE 200, the term score described in the first embodiment can be calculated.

[0150] Furthermore, it has been explained that the user can be notified of an update of the literature information of MEDLINE200. As a result, it is possible to promote a document search for document information newly added to MEDLINE200.

[0151] Also, the notification destination of MEDLINE 200, the territory for notifying search results, the threshold value for adjusting the term score described in the first embodiment, and the list box described in the second embodiment are set. It has been explained that the hierarchy of MeSH terms, the search method described in Embodiment 5 and the password for login can be changed. As a result, user convenience can be improved.

[0152] As described above, it has been described that the narrowed-down document search for obtaining useful document information can be supported by the sixth embodiment.

Brief Description of Drawings

FIG. 1 is a configuration diagram of a document search apparatus 100 according to Embodiment 1.

FIG. 2 is a configuration diagram of a document search unit 110 in the first embodiment.

FIG. 3 is a configuration diagram of a search result generation unit 120 according to Embodiment 1.

[FIG. 4] Term of MeSH term information managed and stored in term DB192 in Embodiment 1. The figure which shows a bull.

[Figure 5] MeSH Descritor tree diagram.

[Figure 6] MeSH Descritor tree diagram.

[Figure 7] Tree diagram of MeSH Descritor.

[Figure 8] Tree diagram of MeSH Descritor.

[Figure 9] MeSH Descritor tree diagram.

FIG. 10 is a diagram showing a screen (search result screen) on which search result display section 105 in Embodiment 1 displays a search result.

[11] FIG. 11 is a diagram showing an appearance of the document search device 100 according to the first embodiment.

12] Hardware configuration diagram of the document search device 100 according to the first embodiment.

FIG. 13 is a flowchart showing the flow of a document search process of the document search apparatus 100 in the first embodiment.

15] A flowchart showing a flow of additional information generation processing of the additional information generation unit 122 in the first embodiment.

FIG. 17 is a diagram showing a result (4) of the search result screen in the first embodiment.

FIG. 18 shows a result (4) of the search result screen in the second embodiment.

圆 19] A flowchart showing the flow of search transition information generation processing of the search transition information generation unit 126 in the third embodiment.

20) A conceptual diagram of search transition information in the third embodiment.

21] A conceptual diagram of search transition information in the third embodiment.

FIG. 22 is a diagram showing a history (5) of a search result screen in the third embodiment.

FIG. 23 is a flowchart showing the flow of a document search process of the document search unit 110 in the fourth embodiment.

FIG. 24 is a diagram showing a result (4) of the search result screen in the fourth embodiment. Explanation of symbols

100 document search device, 101 user environment setting unit, 102 MEDLINE update information acquisition unit, 103 display change unit, 104 login unit, 105 search result display unit, 110 document search unit, 111 synonym search unit, 112 search term input unit, 113 Representative term acquisition unit, 114 Designated term acquisition unit, 115 Query generation unit, 116 MEDLINE access unit, 120 Search result generation unit, 121 MeSH term extraction unit, 122 Additional information generation unit, 123 Classification acquisition unit, 124 Term score calculation , 125 MeSH number acquisition unit, 126 search transition information generation unit, 191 user information DB, 192 term DB, 193 search result storage unit, 200 MED LINE, 210 OMIM, 220 MeSH, 901 CRT display device, 902 Κ / Β , 903 mouse, 904 FDD, 905 CDD, 906 printer device, 907 scanner device, 910 system unit, 911 CPU, 912 node, 913 ROM, 914 RAM, 915 communication board, 920 magnetic disk device, 921 OS, 922 Window system, 923 programs, 924 files, 931 telephones, 932 FAX machines, 940 Internet, 941 Web server, 942 LAN ₀

Claims

The scope of the claims

[1] A literature database that stores a collection of literature information including a thesaurus raster, and a classification database that stores each thesaurus solar in association with the classification of genes, diseases, compounds, and tissues, and retrieves the literature. A search device,

A search text input unit for inputting a search text from an input device;

A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;

A thesaurus raster extracting unit for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit;

For each thesaurus raster extracted by the thesaurus raster extraction unit, a classification acquisition unit that acquires the classification of each thesaurus raster from the classification database, the thesaurus raster extracted by the thesaurus raster extraction unit, and the classification acquisition unit An output unit that associates the classification with the output to the output device;

A document retrieval apparatus comprising:

[2] Bibliographic database that stores a collection of bibliographic information including a thesaurus raster, and the classification of genes, diseases, compounds, and tissues. Each thesaurus raster is stored in association with the classification based on the hierarchical structure of the thesaurus raster. Connected to a classification database to search for documents, a document search device,

A search text input unit for inputting a search text from an input device;

A document retrieval apparatus comprising: [3] A document retrieval device that connects to a document database that stores a collection of document information including a thesaurus raster and retrieves documents.

A search text input unit for inputting a search text from an input device;

For each of the thesaurus rasters extracted by the thesaurus raster extraction unit, acquisition including the thesaurus raster in the number of acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information A term score calculator that calculates a score of the ratio of the number of document information;

An output unit for associating the thesaurus raster extracted by the thesaurus raster extraction unit with the score calculated by the term score calculation unit and outputting it to an output device;

A document retrieval apparatus comprising:

[4] The literature search device further includes:

A storage device for storing the first threshold value and the second threshold value for the number of acquired document information including the thesaurus raster according to the number of acquired document information;

The term score calculator

The storage device power is acquired based on the first threshold value and the second threshold value, and based on the first threshold value, the second threshold value, the number of acquired document information, and the number of acquired document information including the thesaurus raster. The number of acquired document information is less than the first threshold and the thesaurus is included. The score of the thesaurus that has the number of acquired document information less than the second threshold is calculated low.

The document retrieval apparatus according to claim 3, wherein:

[5] A document retrieval device that connects to a document database that stores a collection of document information including a thesaurus raster and retrieves documents.

A search text input unit for inputting a search text from an input device;

The document information corresponding to the search sentence input by the search sentence input unit is stored in the document data. A document information acquisition unit to acquire from

A thesaurus raster extracting unit that extracts the thesaurus raster included in the acquired document information acquired by the document information acquiring unit and stores it in a storage device;

A duplication term number calculating unit for calculating the number of the same thesaurus rasters as the previously extracted thesaurus rasters stored in the storage device among the thesaurus rasters currently extracted by the thesaurus raster extraction unit;

An output unit for outputting the number of thesaurus rasters calculated by the duplicate term number calculation unit to an output device;

A document retrieval apparatus comprising:

[6] A document search device for searching documents by connecting to a document database storing a set of document information including a thesaurus raster and a hierarchy database storing hierarchical information of a thesaurus term.

A search text input unit for inputting a search text from an input device;

A thesaurus hierarchy acquisition unit for acquiring the hierarchy database power of hierarchy information about the thesaurus raster extracted by the thesaurus raster extraction unit;

A designated hierarchy input section for inputting the designation of the hierarchy of the thesaurus raster;

A thesaurus raster selection unit that selects a thesaurus raster corresponding to the hierarchy input by the specified hierarchy input unit among the thesaurus rasters extracted by the thesaurus raster extraction unit based on the hierarchy information acquired by the thesaurus hierarchy acquisition unit;

An output unit for outputting the thesaurus raster selected by the thesaurus raster selection unit to an output device;

A document retrieval apparatus comprising:

[7] A document retrieval device that connects to a document database that stores a collection of document information including a thesaurus raster and retrieves documents. For search sentences that include a thesaurus raster !, a narrow-sense search method that uses the thesaurus raster included in the search sentence as a search condition, a thesaurus raster included in the search sentence, and a thesaurus raster below the thesaurus raster included in the search sentence. A search method input part for inputting designation of a search method in a broad sense search method using

A search sentence input unit for inputting a search sentence including a thesaurus raster, and literature information corresponding to the search sentence input by the search sentence input unit according to a search condition in the search method input by the search method input unit. A literature information acquisition unit to acquire from the literature database;

An output unit for outputting the thesaurus raster extracted by the thesaurus raster extraction unit to an output device;

A document retrieval apparatus comprising:

A document search device that connects to a document database that stores a set of document information including a thesaurus raster and a synonym database that stores synonyms of a thesaurus term, and searches for documents.

A search term input unit for inputting a plurality of search terms from an input device;

A synonym term acquisition unit that collectively acquires each of the thesaurus rasters each having a plurality of search terms input from the search term input unit as synonyms from the synonym database and displays them on a display device;

A search term designating unit for designating a thesaurus raster as a search term from the thesaurus rasters acquired and displayed by the synonym term obtaining unit at once;

A search sentence input unit for inputting a search sentence including a thesaurus raster designated by the search word designation part from an input device;

A thesaurus raster extracting unit for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit; An output unit for outputting the thesaurus raster extracted by the thesaurus raster extraction unit to an output device;

A document retrieval apparatus comprising:

[9] A document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus solar in association with the classification of genes, diseases, compounds, and tissues, and retrieves the documents. A literature search method for a search device,

Literature information in which the retrieval text input section performs retrieval text input processing for inputting a retrieval text from an input device, and the literature information acquisition section obtains literature information corresponding to the retrieval text input by the retrieval text input section from the literature database. Perform the acquisition process,

The thesaurus raster extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit,

Classification acquisition unit power For each of the thesaurus rasters extracted by the thesaurus raster extraction unit, a classification acquisition process for acquiring the classification database power of the classification of each thesaurus raster is performed.

A document retrieval method, wherein the output unit performs an output process in which the thesaurus raster extracted by the thesaurus raster extraction unit and the classification acquired by the classification acquisition unit are associated with each other and output to an output device.

[10] Bibliographic database that stores a collection of bibliographic information including a thesaurus raster, and a classification of genes, diseases, compounds, and tissues. Each thesaurus raster is stored in association with the classification based on the hierarchical structure of the thesaurus raster. The document search method of the document search device for connecting to the classification database to search and searching for documents,

Classification acquisition unit capability The classification acquisition unit acquires the classification of each thesaurus raster for each of the thesaurus rasters extracted by the thesaurus raster extraction unit. Process

[11] A document retrieval method for a document retrieval apparatus that retrieves documents by connecting to a document database that stores a collection of document information including a thesaurus raster.

For each of the thesaurus rasters extracted by the term score calculation unit S and the thesaurus raster extraction unit, the acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information A term score calculation process for calculating a score of the ratio of the number of document information including the thesaurus term in the number of

A document retrieval method, wherein the output unit performs an output process in which the thesaurus raster extracted by the thesaurus raster extraction unit and the score calculated by the term core calculation unit are associated with each other and output to an output device.

[12] The document search method of the document search apparatus, further comprising a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of document information including the thesaurus raster. ,

In the term score calculation process,

The term score calculation unit acquires the storage device power first threshold value and the second threshold value, and acquires the first threshold value, the second threshold value, the number of acquired document information, and the acquired document including the thesaurus raster. Based on the number of information, the number of acquired document information is less than the first threshold, and the number of acquired document information including the thesaurus raster is less than the second threshold. Calculate a low rasta score

The document search method according to claim 11, wherein:

[13] A document retrieval method for a document retrieval device that retrieves documents by connecting to a document database that stores a collection of document information including a thesaurus raster.

The thesaurus raster extraction unit performs a thesaurus raster extraction process that extracts the thesaurus raster included in the acquired document information acquired by the document information acquisition unit and stores it in the storage device. The duplicate term number calculation unit extracts the current thesaurus raster extraction unit this time. The number of duplicate terms is calculated to calculate the number of thesaurus rasters that have been extracted in the past and stored in the storage device by the thesaurus raster extraction unit.

The output unit performs output processing for outputting the number of thesaurus rasters calculated by the duplicate term number calculation unit to the output device.

A document retrieval method characterized by that.

[14] A document retrieval method for a document retrieval apparatus that retrieves documents by connecting to a document database that stores a set of document information including a thesaurus raster and a hierarchy database that stores hierarchical information of a thesaurus term.

The thesaurus hierarchy acquisition unit performs a thesaurus hierarchy acquisition process for acquiring hierarchy information about the thesaurus raster extracted by the thesaurus raster extraction unit from the hierarchy database; The specified hierarchy input part performs the specified hierarchy input process to input the specification of the thesaurus raster hierarchy,

A thesaurus in which a thesaurus raster selection unit selects a thesaurus raster corresponding to the hierarchy input by the designated hierarchy input unit from the thesaurus rasters extracted by the thesaurus raster extraction unit based on the hierarchy information acquired by the thesaurus hierarchy acquisition unit Perform the term selection process,

The output unit performs output processing for outputting the thesaurus raster selected by the thesaurus raster selection unit to an output device.

A document retrieval method characterized by that.

[15] A document retrieval method for a document retrieval device for retrieving a document by connecting to a document database storing a set of document information including a thesaurus raster,

The search method input unit, for a search statement including a thesaurus raster, includes a narrow-sense search method that uses the thesaurus star included in the search statement as a search condition, a thesaurus raster included in the search statement, and a thesaurus raster below the thesaurus raster included in the search statement Perform search method input processing to input the specification of one of the search methods with the broad search method as the search condition !,

The search text input unit performs a search text input process in which a search text including a thesaurus raster is input.

A document in which a document information acquisition unit acquires document information corresponding to a search sentence input by the search sentence input unit from the reference database according to a search condition in the search method input by the search method input unit. Perform information acquisition processing !,

The output unit performs output processing for outputting the thesaurus raster extracted by the thesaurus raster extracting unit to an output device.

A document retrieval method characterized by that.

[16] A document retrieval method for a document retrieval apparatus that retrieves documents by connecting to a document database that stores a set of document information including a thesaurus raster and a synonym database that stores synonyms of a thesaurus term. The search term input unit performs a search term input process in which a plurality of search terms are input from an input device, and the synonym term acquisition unit obtains each thesaurus raster having the plurality of search terms input by the search term input unit as synonyms. Perform synonym term acquisition processing to acquire the synonym database power,

The designation unit performs a designation process for designating a single raster term as a search term acquired by the synonym term acquisition unit,

The search part input unit performs a search sentence input process in which a search sentence including the thesaurus raster specified by the specification part is input from the input device.

A document information acquisition unit performs document information acquisition processing for acquiring document information corresponding to the search sentence input by the search sentence input unit from the document database,

A document retrieval method characterized by that.

[17] A document search program that causes a computer to execute the document search method according to any one of claims 9 to 16.

[18] A document retrieval device that retrieves documents by connecting a document database that stores a set of document information and a classification database that stores each document term in association with a specific classification, and inputs a search sentence from an input device. Search text input part,

A document term extraction unit for extracting a document term included in the acquired document information acquired by the document information acquisition unit;

For each document term extracted by the document term extraction unit, a classification acquisition unit that acquires a classification of each document term from the classification database;

The literature term extracted by the literature term extraction unit and the classification acquired by the classification acquisition unit And an output unit that outputs to the output device in association with

A document retrieval apparatus comprising:

[19] A document retrieval device that connects to a document database that stores a set of document information and retrieves documents.

A search text input unit for inputting a search text from an input device;

For each document term extracted by the document term extraction unit, the document term in the number of document information acquired is included in the ratio of the number of document information including the document term in the number of document information in the set of document information. A term score calculator for calculating a score of the ratio of the number of acquired document information;

An output unit that associates the document term extracted by the document term extraction unit with the score calculated by the term score calculation unit, and outputs it to an output device;

A document retrieval apparatus comprising:

[20] The literature search device further includes:

A storage device for storing a first threshold for the number of acquired document information and a second threshold for the number of acquired document information including the document term;

The term score calculator

The storage device power is obtained based on the first threshold value and the second threshold value, and based on the first threshold value, the second threshold value, the number of obtained document information, and the number of obtained document information including the document term. The score of the document term that has the number of acquired document information less than the first threshold and the number of acquired document information that includes the document term is less than the second threshold is calculated low.

20. The document retrieval apparatus according to claim 19, wherein

[21] A document retrieval device that connects to a document database that stores a set of document information and retrieves documents.

A search text input unit for inputting a search text from an input device; A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;

A document term extraction unit that extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores it in a storage device;

A duplicate term number calculating unit for calculating the number of the same document terms as the previously extracted document terms stored in the storage device by the document term extracting unit among the document terms extracted this time by the document term extracting unit; ,

The document retrieval apparatus comprising: an output unit that outputs the number of document terms calculated by the duplicate term number calculation unit to an output device.

[22] A document retrieval method of a document retrieval apparatus for retrieving a document by connecting to a document database storing a set of document information and a classification database storing each document term in association with a specific classification.

A document term extraction unit performs a document term extraction process for extracting a document term included in the acquired document information acquired by the document information acquisition unit,

For each document term extracted by the document term extraction unit, the classification acquisition unit performs a classification acquisition process for acquiring the classification database power of the classification of each document term, and the output unit and the document term extracted by the document term extraction unit An output process is performed in which the classification acquired by the classification acquisition unit is associated with the classification and output to the output device.

A document retrieval method characterized by that.

[23] A literature retrieval method for a literature retrieval apparatus that connects to a literature database that stores a set of literature information and retrieves literature.

A sentence included in the acquired document information acquired by the document information acquiring unit by the document term extracting unit Perform literature term extraction processing to extract contribution terms,

For each document term extracted by the term score calculation unit S and the document term extraction unit, the number of acquired document information with respect to the ratio of the number of document information including the document term in the number of document information in the set of document information Perform a term score calculation process to calculate the score of the ratio of the number of acquired document information including the document term in

The output unit performs an output process in which the document term extracted by the document term extraction unit and the score calculated by the term score calculation unit are associated with each other and output to an output device.

A document retrieval method characterized by that.

[24] The document search method of the document search apparatus, further comprising a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of document information including the document term. ,

In the term score calculation process,

The term score calculation unit acquires the first threshold value and the second threshold value of the storage device power, and acquires the first threshold value, the second threshold value, the number of acquired document information, and the acquired document information including the document term Based on the number of documents, the score of the document terms in which the number of acquired document information is less than the first threshold and the number of acquired document information including the document term is less than the second threshold is calculated low.

24. The document retrieval method according to claim 23.

[25] A document retrieval method for a document retrieval device that connects to a document database that stores a set of document information and retrieves documents.

A document term extraction unit performs a document term extraction process that extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores it in a storage device,

Duplicate term count calculation unit calculates the number of the same reference terms as previously extracted reference terms stored in the storage device by the reference term extraction unit among the reference terms currently extracted by the reference term extraction unit Term number calculation processing, The output unit performs an output process of outputting the number of document terms calculated by the duplicate term number calculation unit to an output device.

A document retrieval method characterized by that.

[26] A document search program that causes a computer to execute the document search method according to any one of claims 22 to 25.

[27] A document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus solar in association with the classification of genes, diseases, compounds, and tissues, and retrieves the documents. A search device,

A search text input unit for inputting a search text from an input device;

For each thesaurus raster extracted by the thesaurus raster extraction unit, a classification acquisition unit that acquires the classification of each thesaurus raster from the classification database; and for each thesaurus raster extracted by the thesaurus raster extraction unit A term score calculation unit that calculates a score of a ratio of the number of acquired document information including the thesaurus raster in the number of acquired document information to a ratio of the number of document information including the thesaurus raster in the number of information;

Along with the thesaurus raster extracted by the thesaurus raster extraction unit, the classification acquired by the classification acquisition unit and the score calculated by the term score calculation unit are output to the output device as information related to the relationship between molecules, diseases, etc. Output part to

A document retrieval apparatus comprising:

[28] A document that stores a collection of document information including a thesaurus raster, and a database that searches each document by connecting each thesaurus solar to a classification database that stores them in association with the classification of genes, diseases, compounds, and tissues. A literature search method for a search device,

The search text input unit performs a search text input process for inputting the search text from the input device, and the literature information acquisition unit stores the literature information corresponding to the search text input by the search text input unit. Perform literature information acquisition processing to acquire from the literature database,

For each of the thesaurus rasters extracted by the term score calculation unit S and the thesaurus raster extraction unit, the acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information The term score calculation process is performed to calculate the score of the ratio of the number of acquired document information including the thesaurus term in the number of

The output unit outputs the classification obtained by the classification acquisition unit and the score calculated by the term score calculation unit together with the thesaurus raster extracted by the thesaurus raster extraction unit as information related to the relationship between molecules, diseases, and the like. A document retrieval method characterized by performing an output process for outputting to a device.

A document search program for causing a computer to execute the document search method according to claim 28.