WO2007060726A1 - Document retrieval device, method, and program - Google Patents

Document retrieval device, method, and program Download PDF

Info

Publication number
WO2007060726A1
WO2007060726A1 PCT/JP2005/021652 JP2005021652W WO2007060726A1 WO 2007060726 A1 WO2007060726 A1 WO 2007060726A1 JP 2005021652 W JP2005021652 W JP 2005021652W WO 2007060726 A1 WO2007060726 A1 WO 2007060726A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
thesaurus
raster
search
term
Prior art date
Application number
PCT/JP2005/021652
Other languages
French (fr)
Japanese (ja)
Inventor
Shigeki Tanishima
Hiroko Matsui
Original Assignee
Mitsubishi Space Software Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Space Software Co., Ltd. filed Critical Mitsubishi Space Software Co., Ltd.
Priority to PCT/JP2005/021652 priority Critical patent/WO2007060726A1/en
Priority to JP2006515526A priority patent/JP4398461B2/en
Publication of WO2007060726A1 publication Critical patent/WO2007060726A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Definitions

  • the present invention relates to a document search device, a document search method, and a document search program that support document search.
  • the hit documents are only displayed, and the keyword used for the re-search is invented and added by the user each time.
  • the user had to come up with keywords suitable for narrowing down. For this reason, it has been very difficult to narrow down the search by the conventional document search system.
  • a simple search is performed using a simple keyword, a huge number of documents (for example, 10,000 or more) will be hit, and the search result that indicates the large number of documents will also be important. It is difficult to select a document and obtain knowledge in the document.
  • a conventional medical literature search system is an example of a system in which a narrow search is difficult, and the system is as follows.
  • the medical literature database includes MEDLINE (MEDlars on LINE) provided by the National Library of Medicine (NLM), which contains more than 10 million medical-related literature information.
  • MEDLINE MEDlars on LINE
  • NLM National Library of Medicine
  • PubMed a document search system
  • users access a web page with an interface to access PubMed and search medical document information recorded in MEDLINE.
  • MEDLINE's bibliographic information includes medical terms (MeSH term: Medical Subject Headings), terminology or name of protein's compound (Substance), RegistryNumber, etc. It becomes a key when searching.
  • MeSH terms and substance information are recorded in the MeSH database accessible by PubMed, just like MEDLINE.
  • the MeSH database has three databases called Descriptor Quarifier SCR (Supplementary Concept Records). To do.
  • the Descriptor database For each MeSH term (Descriptor term), the Descriptor database stores information such as Heading, a synonym for Heading that is a name that uniquely identifies the Me SH term, a TreeNumber that identifies the MeSH term in a hierarchical structure, and a description of the MeSH term.
  • the SCR stores information on the terms of Substane (SCR terms) in association with MeSH terms stored in the Descriptor database.
  • OMIM Online Mendelian Inheritance in Man
  • RegistryNumber CAS registration number
  • CAS chemical information database
  • Patent Document 1 Japanese Patent Laid-Open No. 2003-044481
  • the present invention for example, in document search for a large amount of document information, narrows down a large amount of document information and supports document search to acquire useful document information. At the same time, it aims to promote new medical 'molecular biology discovery related to the relationship between molecules and diseases.
  • the document retrieval apparatus of the present invention includes a document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues.
  • a search sentence input unit for connecting and inputting a search sentence as input device power, a literature information acquisition part for obtaining literature information corresponding to the search sentence input by the search sentence input unit from the literature database, and the literature information acquisition unit
  • the acquisition unit, the thesaurus raster extracted by the thesaurus star extraction unit, and the classification acquisition unit The classification And an output unit that outputs to the output device in association with each other.
  • the document retrieval apparatus of the present invention is a document database that stores a collection of document information including a thesaurus raster, a classification of genes, diseases, compounds, and tissues, and is associated with a classification based on the hierarchical structure of the thesaurus raster.
  • a search sentence input unit for inputting a search sentence from an input device, and reference information corresponding to the search sentence input by the search sentence input unit.
  • the document information acquisition unit acquired from the document the thesaurus raster extraction unit for extracting the thesaurus term included in the acquired document information acquired by the document information acquisition unit, and the thesaurus raster extracted by the thesaurus raster extraction unit, respectively
  • a classification acquisition unit for acquiring the classification of each thesaurus raster from the classification database; and an extraction of the thesaurus raster extraction unit.
  • an output unit that associates the generated thesaurus star with the classification acquired by the classification acquisition unit and outputs the same to an output device.
  • the document search device of the present invention is connected to a document database that stores a set of document information including a thesaurus raster, and includes a search sentence input unit that inputs a search sentence as input device power, and the search sentence input unit.
  • a document information acquisition unit that acquires document information corresponding to the input search sentence from the document database, a thesaurus raster extraction unit that extracts a thesaurus raster included in the acquired document information acquired by the document information acquisition unit, and the thesaurus
  • Each of the thesaurus rasters extracted by the term extraction unit includes the corresponding thesaurus raster in the number of acquired document information with respect to the ratio of the number of document information including the corresponding thesaurus raster in the number of document information in the set of document information.
  • a term score calculation unit that calculates a score of the ratio of the number of acquired document information, and an extraction by the thesaurus raster extraction unit Characterized by comprising an output section for outputting to the output device in association with calculated the score Shiso Rasutamu said term score calculation unit has.
  • a storage device for storing a first threshold for the number of pieces of acquired document information and a second threshold for the number of pieces of acquired document information including the thesaurus raster, and the term score
  • the calculation unit acquires the first threshold value and the second threshold value of the storage device power, and acquires the first threshold value, the second threshold value, the number of acquired document information, and the acquired document information including the thesaurus raster. Based on the number and the number of acquired document information is less than the first threshold and the thesaurus It is characterized in that the score of the thesolar star whose number of acquired document information including the star is less than the second threshold is calculated low.
  • the document search device of the present invention is connected to a document database that stores a set of document information including a thesaurus raster, and includes a search sentence input unit that inputs a search sentence as input device power, and the search sentence input unit.
  • a document information acquisition unit that acquires document information corresponding to the input search sentence from the document database, and a thesaurus raster extraction that extracts a thesaurus raster contained in the acquired document information acquired by the document information acquisition unit and stores it in a storage device And the number of duplicate terms that calculates the number of thesaurus the same as the thesaurus raster extracted in the past by the thesaurus raster extraction unit in the thesaurus raster extracted this time by the thesaurus raster extraction unit And an output unit that outputs the number of thesaurus rasters calculated by the duplicate term number calculation unit to an output device.
  • the document retrieval apparatus of the present invention is connected to a document database that stores a set of document information including a thesaurus raster and a hierarchy database that stores the hierarchy information of the thesaurus raster, and inputs a retrieval sentence from an input device.
  • a search sentence input unit a document information acquisition unit that acquires document information corresponding to the search sentence input by the search statement input unit from the document database, and a thesaurus included in the acquired document information acquired by the document information acquisition unit
  • a thesaurus raster extracting unit for extracting terms
  • a thesaurus hierarchy acquiring unit for acquiring hierarchical information about the thesaurus raster extracted by the thesaurus raster extracting unit
  • a specified hierarchy input for inputting the specification of the hierarchy of the thesaurus raster
  • the specified hierarchy input of the thesaurus raster extracted by the slicer raster extraction unit is
  • a thesaurus raster selection unit for selecting a thesaurus raster corresponding to the hierarchy input by the unit based on the hierarchy information acquired by the thesaurus hierarchy acquisition unit, and the thesaurus raster selected by the thesaurus raster selection unit as an output device And an output unit that outputs to the output.
  • the document search apparatus of the present invention is connected to a document database that stores a set of document information including a thesaurus raster, and for a search sentence including a thesaurus raster, a narrow-sense search using the thesaurus raster included in the search sentence as a search condition
  • a broad search method that uses a thesaurus raster including the search method and the thesaurus raster including the search statement and the thesaurus raster below the thesaurus raster included in the search statement as search conditions;
  • thesaurus A search text input unit that inputs a search text including a term from an input device, and according to a search condition in the search method input by the search method input unit, the document information corresponding to the search text input by the search text input unit is A document information acquisition unit acquired from a document database, a thesaurus raster extraction unit that extracts a thesaurus raster included in the acquired document information acquired by the document information acquisition unit, and the thesaurus raster extracted by the thesaurus raster extraction unit And an output unit that outputs to the output.
  • the document retrieval apparatus of the present invention is connected to a document database that stores a set of document information including a thesaurus raster and a synonym database that stores synonyms of the thesaurus raster, and inputs a plurality of search words. And a synonym term for acquiring each of the thesaurus rasters having synonyms for the plurality of search terms input by the search term input unit and displaying them on the display device collectively.
  • a search term designating unit for designating a thesaurus raster as a search term from the thesaurus rasters acquired and displayed by the synonym term obtaining unit, and a search sentence including the thesaurus raster designated by the search term designating unit.
  • Input device power A search sentence input part to be input and reference information corresponding to the search sentence input by the search sentence input part are stored in the reference database.
  • the document search method of the document search apparatus of the present invention stores a document database that stores a collection of document information including a thesaurus raster, and stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues.
  • a document search method of a document search apparatus that connects to a classification database and searches for a document, wherein a search sentence input unit inputs a search sentence from an input device, and a reference information acquisition unit performs the search sentence input process.
  • a document information acquisition process is performed to acquire the document information corresponding to the search text input by the input unit, and the thesaurus raster extraction unit includes the thesaurus raster included in the acquired document information acquired by the document information acquisition unit.
  • a thesaurus raster extraction process is performed, and the classification acquisition unit applies to each of the thesaurus rasters extracted by the thesaurus raster extraction unit.
  • Each sheet A classification acquisition process for acquiring the classification of the sorrustum from the classification database is performed, and the output unit associates the thesaurus raster extracted by the thesaurus raster extraction unit with the classification acquired by the classification acquisition unit, and outputs it to the output device It is characterized by output processing.
  • the document search method of the document search apparatus of the present invention includes a document database that stores a collection of document information including a thesaurus raster, a classification of genes, diseases, compounds, and tissues.
  • a document retrieval method for a document retrieval apparatus that retrieves documents by connecting to a classification database that stores each thesaurus raster in association with classification based on a hierarchical structure.
  • the search sentence input unit performs a search sentence input process for inputting the search sentence from the input device, and the document information acquisition unit acquires the document information corresponding to the search sentence input by the search sentence input unit.
  • the document information acquisition process is performed, and the thesaurus raster extraction unit performs a thesaurus raster extraction process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquisition unit, and the classification acquisition unit performs the thesaurus raster extraction.
  • a classification acquisition process for acquiring the classification of each thesaurus raster from the classification database is performed, and the output section extracts the thesaurus raster extracted by the thesaurus raster extraction section and the classification acquisition section.
  • the output processing is performed by associating the acquired classification and outputting to the output device.
  • the document search method of the document search device of the present invention is a document search method of a document search device that searches a document by connecting to a document database that stores a collection of document information including a thesaurus raster.
  • the input unit performs a search statement input process for inputting a search statement from an input device, and the document information acquisition unit acquires the document information corresponding to the search statement input by the search statement input unit from the document database.
  • the thesaurus term extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit, and the term score calculating unit For each of the thesaurus rasters extracted by the extraction unit, the ratio of the number of document information that includes the thesaurus raster to the number of document information in the collection of document information.
  • the output unit is the thesaurus data
  • An output process is performed in which the thesaurus raster extracted by the term extraction unit and the score calculated by the term score calculation unit are associated with each other and output to an output device.
  • the document retrieval method of the document retrieval device further includes a storage device that stores a first threshold value regarding the number of acquired document information and a second threshold value regarding the number of document information including the thesaurus raster.
  • the term score calculation unit acquires the first threshold value and the second threshold value of the storage device power, and the first threshold value, the second threshold value, and the number of acquired document information And the number of acquired document information that includes the thesaurus raster, and the number of acquired document information is less than the first threshold and the number of acquired document information that includes the thesaurus raster is less than the second threshold.
  • a literature search method characterized by calculating a low score of the thesaurus raster.
  • the document search method of the document search device of the present invention is a document search method of a document search device for searching a document by connecting to a document database that stores a collection of document information including a thesaurus raster.
  • the input unit performs a search statement input process for inputting a search statement from an input device, and the document information acquisition unit acquires the document information corresponding to the search statement input by the search statement input unit from the document database.
  • the thesaurus term extraction unit extracts the thesaurus raster contained in the acquired document information acquired by the document information acquisition unit and stores it in the storage device, and calculates the number of duplicate terms.
  • the thesaurus raster extracted by the thesaurus raster extraction unit stored in the storage device in the past
  • the number of thesaurus terms that are the same as the number of thesaurus rasters is calculated, and the output unit performs the output process that outputs the number of thesaurus rasters calculated by the number of thesaurus terms to the output device.
  • the document search method of the document search apparatus of the present invention is connected to a document database that stores a collection of document information including a thesaurus raster and a hierarchical database that stores hierarchical information of the thesaurus raster, and searches the document.
  • a document search method for a document search apparatus wherein a search sentence input unit performs a search sentence input process for inputting a search sentence from an input device !, and corresponds to a search sentence input by the reference information acquisition unit.
  • a document information acquisition process for acquiring document information from the document database is performed, and the thesaurus raster extraction unit
  • a thesaurus raster extraction process is performed to extract a thesaurus raster contained in the acquired document information acquired by the information acquisition unit
  • the thesaurus hierarchy acquisition unit acquires the hierarchy information about the thesaurus raster extracted by the thesaurus raster extraction unit from the hierarchy database.
  • the thesaurus hierarchy acquisition processing is performed, the designated hierarchy input section performs the designation hierarchy input process for inputting the designation of the thesaurus raster hierarchy, and the thesaurus raster selection section performs the designation of the thesaurus raster extracted by the thesaurus term extraction section.
  • a thesaurus raster selection process for selecting a thesaurus raster corresponding to the hierarchy input by the hierarchy input section based on the hierarchy information acquired by the thesaurus hierarchy acquisition section is performed, and the output section selects the thesaurus raster selection section.
  • the document search method of the document search device of the present invention is a document search method of a document search device that connects to a document database that stores a collection of document information including a thesaurus raster and searches the document.
  • the input unit searches for the narrow-sense search method using the thesaurus raster included in the search sentence as the search condition, the thesaurus raster included in the search sentence, and the thesaurus raster below the thesaurus raster included in the search sentence.
  • Search condition input process to input the specification of!
  • the document information acquisition unit performs processing, and the document information corresponding to the search sentence input by the search sentence input unit according to the search condition in the search method input by the search method input unit.
  • the thesaurus database extraction processing is performed to acquire the literature information
  • the thesaurus raster extraction unit performs the thesaurus raster extraction processing to extract the thesaurus raster included in the acquired document information acquired by the literature information acquisition unit
  • the output unit is the thesaurus It is characterized by output processing that outputs the thesaurus raster extracted by the term extraction unit to an output device.
  • the document search method of the document search device of the present invention is connected to a document database that stores a collection of document information including a thesaurus raster and a synonym database that stores synonyms of the thesaurus raster, and searches the document.
  • the search term input unit performs a search term input process for inputting a plurality of search terms from an input device
  • the synonym term acquisition unit performs a plurality of search terms input by the search term input unit.
  • Each search term is a synonym.
  • the search text input unit performs a search text input process for inputting a search text including the thesaurus raster specified by the specification unit.
  • the document information acquisition unit performs document information acquisition processing for acquiring the document information corresponding to the search sentence input by the search sentence input unit from the document database !
  • the thesaurus raster extracting unit includes the document information acquisition unit.
  • a thesaurus raster extraction process for extracting a thesaurus raster contained in the acquired document information acquired is performed, and an output unit performs an output process for outputting the thesaurus raster extracted by the thesaurus raster extraction unit to an output device.
  • a document search program of the present invention causes a computer to execute any one of the document search methods described above.
  • the document retrieval apparatus of the present invention is connected to a document database that stores a set of document information and a classification database that stores each document term in association with a specific classification, and inputs a retrieval sentence from an input device.
  • the literature information acquisition section for acquiring the literature information corresponding to the search text input by the search text input section from the literature database, and the acquired literature information acquired by the literature information acquisition section
  • a document term extraction unit that extracts a document term, a document acquisition unit that acquires a classification of each document term from the classification database, and a document term extraction unit that each document term extracted by the document term extraction unit is extracted.
  • an output unit that associates the literature term with the classification acquired by the classification acquisition unit and outputs the associated term to the output device.
  • the document retrieval apparatus of the present invention is connected to a document database that stores a set of document information, and corresponds to a retrieval sentence input unit that inputs a retrieval sentence from an input device, and a retrieval sentence that is input by the retrieval sentence input unit.
  • a document information acquisition unit that acquires document information to be acquired from the document database, a document term extraction unit that extracts document terms included in the acquired document information acquired by the document information acquisition unit, and a document that is extracted by the document term extraction unit For each term, the number of document information containing the document term in the number of document information in the set of document information
  • a term score calculation unit that calculates a score of the ratio of the number of acquired document information including the document term in the number of acquired document information with respect to the ratio, and the document term extracted by the document term extraction unit and the term score calculation unit
  • an output unit that associates the calculated score with the score and outputs the score to an output device.
  • a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of acquired document information including the document term, the term score calculator The storage device power is acquired based on the first threshold value and the second threshold value, and based on the first threshold value, the second threshold value, the number of acquired document information, and the number of acquired document information including the document term.
  • the score of the document term in which the number of acquired document information is less than the first threshold and the number of acquired document information is less than the second threshold is calculated to be low.
  • the document retrieval apparatus of the present invention is connected to a document database that stores a set of document information, and corresponds to a retrieval sentence input unit that inputs a retrieval sentence from an input device, and a retrieval sentence that is input by the retrieval sentence input unit.
  • a document information acquisition unit that acquires document information to be acquired from the document database; a document term extraction unit that extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores the document term in a storage device; and the document term Among the terminology extracted by the extraction unit this time, the duplicate term number calculation unit that calculates the number of the same document terms as the literature term extracted in the past stored in the storage device by the literature term extraction unit, and the duplication
  • the term number calculation unit includes an output unit that outputs the calculated number of reference terms to an output device.
  • the document search method of the document search apparatus of the present invention connects a document database that stores a set of document information and a classification database that stores each document term in association with a specific classification, and searches the document.
  • the document search method of the document search apparatus performs search sentence input processing in which the search sentence input unit inputs the search sentence from the input device, and the document information acquisition unit corresponds to the search sentence input by the search sentence input unit.
  • a document information acquisition process for acquiring document information from the document database is performed, and a document term extraction unit performs a document term extraction process for extracting a document term included in the acquired document information acquired by the document information acquisition unit, and classified.
  • the acquisition unit For each document term extracted by the document term extraction unit, the acquisition unit performs classification acquisition processing for acquiring the classification of each document term from the classification database, and the output unit performs the sentence acquisition.
  • An output process is performed in which the literature terms extracted by the dedicated term extraction unit are associated with the classification acquired by the classification acquisition unit and output to an output device.
  • the document search method of the document search device of the present invention is a document search method of a document search device that searches for documents by connecting to a document database that stores a set of document information, and the search sentence input unit searches the document search method.
  • a document information acquisition process in which a document is acquired from an input device, and a document information acquisition unit acquires document information corresponding to the search statement input by the search statement input unit from the document database.
  • the document term extraction unit performs a document term extraction process for extracting the document term included in the acquired document information acquired by the document information acquisition unit, and the term score calculation unit extracts the document term extraction unit.
  • the acquired document information including the document term in the number of acquired document information to the ratio of the number of document information including the document term in the number of document information in the set of document information A term score calculation process is performed to calculate a score of the ratio of the number of documents, and the output unit associates the document term extracted by the document term extraction unit with the score calculated by the term score calculation unit and outputs it to the output device The output processing is performed.
  • the document search method of the document search apparatus further includes a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of document information including the document term.
  • the term score calculation unit force, the storage device force, the first threshold value and the second threshold value are acquired, the first threshold value, the second threshold value, the number of acquired document information, and the document Based on the number of acquired document information that includes the term, the number of acquired document information that is less than the first threshold and the number of acquired document information that includes the document term is less than the second threshold The term score is calculated low.
  • the document search method of the document search device of the present invention is a document search method of a document search device that searches a document by connecting to a document database that stores a set of document information, and the search sentence input unit searches the document search method.
  • the document term extraction unit extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores it in a storage device, and the duplicate term number calculation unit extracts the document term Extracted this time Among the document terms, the document term extraction unit calculates the number of document terms that are the same as the document terms extracted in the past and stored in the storage device. An output process is performed in which the number of document terms calculated by the number calculation unit is output to an output device.
  • a document search program of the present invention causes a computer to execute any one of the document search methods described above.
  • the document retrieval apparatus of the present invention includes a document database that stores a set of document information including a thesaurus raster, and a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues.
  • a search sentence input unit for connecting and inputting a search sentence as input device power, a literature information acquisition part for obtaining literature information corresponding to the search sentence input by the search sentence input unit from the literature database, and the literature information acquisition unit
  • Term score that calculates the score of the ratio of the number of acquired document information that includes the thesaurus raster in the number of acquired document information to the ratio of the number of document information that includes the thesaurus raster in the number of document information in the set of Together with the calculation unit and the thesaurus raster extracted by the thesaurus raster extraction unit, the classification acquired by the classification acquisition unit and the score calculated by the term core calculation unit are used as information related to the relationship between molecules, diseases, and the like. And an output unit for outputting to an output device.
  • the document retrieval method of the present invention includes a document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues.
  • a document search method of a document search apparatus for connecting and searching for documents wherein a search sentence input unit performs a search sentence input process for inputting a search sentence from an input device, and a document information acquisition unit is input to the search sentence input unit
  • a document information acquisition process is performed to acquire document information corresponding to the retrieved text from the document database, and the thesaurus raster extracting unit includes the thesaurus included in the acquired document information acquired by the document information acquiring unit.
  • the classification acquisition unit performs a classification acquisition process for acquiring the classification of each thesaurus raster as well as the classification database power for each of the thesaurus rasters extracted by the thesaurus term extraction unit!
  • a term score calculation process is performed to calculate a score of the ratio of the number of acquired document information including the thesaurus raster, and the output unit force together with the thesaurus raster extracted by the thesaurus raster extraction unit and the classification acquired by the classification acquisition unit
  • the score calculated by the term score calculator Preparative, molecules, and performing an output process of outputting to the output device as the information related to the related, such as a disease.
  • a document search program causes a computer to execute the document search method.
  • the classification (gene, disease, compound, tissue), and term score as well as the thesaurus raster (for example, MeSH term) included in the hit literature information are used as information for narrowing down medical literature information.
  • the synonym raster for example, MeSH term
  • the classification and term score can be output as information related to the relationship between molecules, diseases, etc., together with the thesaurus raster included in the hit literature information, so that medical 'molecular biological discovery Can be encouraged.
  • the number of thesaurus rasters extracted in the past in the thesaurus raster extracted this time can be output, so that the transition of the search is effective. Therefore, the narrowed-down documents for obtaining useful literature information Search can be supported.
  • a thesaurus raster of a specified hierarchy among the thesaurus rasters included in the hit literature information can be output as information for narrowing down medical literature information, so that a narrowed-down literature search for obtaining useful literature information can be performed. Can help.
  • the literature information is searched by a specified search method between narrow search and broad search, and hits are found.
  • the thesaurus raster contained in the document information is searched as information for narrowing down medical document information, it is possible to support a narrowed-down document search for obtaining useful document information.
  • a thesaurus raster having the input search word as a synonym is acquired, and further, the corresponding literature information can be searched at once in the specified thesaurus raster among the acquired thesaurus rasters, so that the narrowed-down document search can be performed. Can help.
  • the present invention it is possible to output the classification and term score as information for narrowing down the literature information together with the literature terms included in the hit literature information. It is possible to support a narrowed-down document search for obtaining information.
  • the classification and term score can be output as information related to the relationship between molecules, diseases, etc., together with the literature terms included in the hit literature information, so that the medical-molecular biological discovery is possible. Can be encouraged.
  • FIG. 1 is a configuration diagram of the document search device 100 according to the first embodiment.
  • the document retrieval apparatus 100 includes the following.
  • the user environment setting unit 101 inputs various setting information (hereinafter referred to as environment setting information) of the document search apparatus 100 for each user as input device power, and the input environment setting information is input to the user.
  • This information is stored in the user information DB (database) 191.
  • the user information DB 191 is information related to a user when using the document search device 100, such as a login ID (identifier), password, and environment setting information for the document search device 100 (hereinafter, User information) is stored in a storage device, and access management for user information is performed.
  • the MEDLINE update information acquisition unit 102 periodically accesses NLM's FTP (File Transfer Protocol) site and provides information about the document information newly recorded (or scheduled to be recorded) in MEDLINE 200 (hereinafter referred to as update information). ) And based on the acquired update information, update the term DB (database) 192, which will be explained below, and notify users about the MED LINE200 bibliographic information update. .
  • the term DB192 is for MeSH terms: Heading (hereinafter referred to as “representative term”), Synonym of Heading (hereinafter referred to as “synonymous term”), MeSH TreeNumber (hereinafter referred to as “MeSH number”), MeSH term Explanation of terms, classifications belonging to medical information (genes, diseases, compounds, tissues, etc.) (hereinafter referred to as classifications), and the number of documents related to the MeSH term among all documents recorded in MEDLINE ( Information related to each MeSH term (hereinafter referred to as MeSH term information), such as the number of documents per term in all documents, is stored in the storage device, and access management for MeSH term information is performed.
  • MeSH term information Information related to each MeSH term (hereinafter referred to as MeSH term information), such as the number of documents per term in all documents, is stored in the storage device, and access management for MeSH term information is performed.
  • the search result storage unit 193 includes MEDLINE200 search result information (hereinafter referred to as MEDLINE result information), document information corresponding to a query (search sentence) (hereinafter referred to as hit document information), and MEDLINE200.
  • MEDLINE result information search result information
  • hit document information document information corresponding to a query
  • MEDLINE200 MEDLINE200 search result information
  • MeSH terms including SCR, which is also the search result information of the document search device 100 (hereinafter referred to as search result information), which also extracts the hit document information power, are described below.
  • This is a storage device that stores information other than the representative term (hereinafter referred to as additional information) and information indicating the transition of search (hereinafter referred to as search transition information).
  • the display change unit 103 inputs information about the display contents of the search result (hereinafter referred to as display designation information) from the input device, and outputs display information based on the display designation information (hereinafter referred to as display change information). To change the display of search results.
  • the login unit 104 also inputs the login ID and password specified by the user as input devices (hereinafter, the login ID and password specified by the user are referred to as login information), and stores the input login information and user information in the DB191. Based on the received user information, the user's access to the document search device 100 is controlled.
  • the document search unit 110 generates a query based on user designation, accesses the MEDLINE 200 using the generated query, and acquires MEDLINE result information from the MEDLINE 200.
  • the search result generation unit 120 generates search result information based on the MEDLINE result information.
  • the generated search result information is stored in the search result storage unit 193.
  • the search result display unit 105 displays the search result on the display device based on the MEDLINE result information and the search result information, and changes the display of the search result based on the display change information.
  • FIG. 2 is a configuration diagram of the document search unit 110 according to the first embodiment.
  • the document search unit 110 includes the following.
  • the synonym search unit 111 also inputs the input device power to the term specified by the user first to search the literature information of MEDLINE200 (hereinafter referred to as the search term), and the input search term is set as the representative term or synonymous term.
  • the representative term of the MeSH term is obtained from the term DB 192, and the term specified by the user (hereinafter referred to as the designated term) to be used for the query to access MEDLINE200 among the obtained representative terms is input from the input device. input.
  • the search term input unit 112 inputs a search term as input device power.
  • the representative term acquisition unit 113 acquires the representative term of the MeSH term whose search term is the representative term or synonymous term from the term DB 192.
  • the specified term acquisition unit 114 inputs the specified term as input device power.
  • the query generation unit 115 generates a query based on the specified term and environment setting information.
  • the MEDLINE access unit 116 inputs a query as input device power and uses the entered query.
  • FIG. 3 is a configuration diagram of the search result generation unit 120 in the first embodiment.
  • search result generation unit 120 in the first embodiment will be described below based on FIG.
  • the search result generation unit 120 includes the following.
  • the MeSH term extraction unit 121 extracts MeSH terms from hit document information (MeSH term extraction also means acquisition of representative terms corresponding to the extracted MeSH terms).
  • the additional information generation unit 122 generates additional information based on the MEDLINE result information.
  • the classification acquisition unit 123 acquires the extraction term classification from the term DB 192.
  • the term score calculation unit 124 calculates a score (term score) indicating the importance of the extracted term for the hit document.
  • the MeSH number acquisition unit 125 acquires the MeSH number of the extracted term from the term DB 192.
  • the search transition information generation unit 126 is the current MEDLINE result information, extracted term information, attached calo information, past MEDLINE result information, extracted term information, and additional information. Based on the above, search transition information is generated.
  • FIG. 4 is a table showing MeSH term information stored and managed in the term DB 192 according to the first embodiment.
  • Term DB192 is a table (hereinafter referred to as “Heading”, “MeSH number”, “Term description”, “Classification”, “Number of documents per term in all documents”, and “Synonym terms” for each MeSH term) , MeSH term table) is stored in advance as MeSH term information, and access management of MeSH term information is performed.
  • “Entry ID” is information for identifying MeSH term information for each MeSH term.
  • “Classification” is a classification of medical information of genes (genes), diseases (disseases), compounds (chemicals), and tissues (tissues), and the classifications to which the MeSH terms indicated by the “representative terms (heading)” belong. Show. gene Is indicated as “G”, disease as “D”, compound as “C”, and tissue as “T”.
  • the MeSH term whose representative term is “Neoplasms”, is identified by the MeSH number "C 04", is classified as “Disease (D)", and is related to all the documents included in MEDLINE.
  • the number of documents (number of documents per term in all documents) is “135284”, indicating that it has synonymous terms such as rcancersj and “Benign”. It also indicates that MeSH term information with “Neoplasms” as the representative term is set in the entry identified by the entry ID “9028”.
  • Entry ID set unique IDs for the number of MeSH terms in the MeSH term information to be stored and managed.
  • the information stored in the MeSH database is set in “Heading”, “MeSH number”, “Term description”, and “Synonym term”.
  • “Number of documents per term in all documents” is set to MEDLINE200 in a query using the OR (logical sum) condition for all MeSH terms set to “Heading” and MeSH terms set to “Synonymous terms” in the entry. Set the number of hit documents when accessed.
  • Figures 5, 6, 7, 8, and 9 are MeSH Descritor tree diagrams.
  • G is set in the term DB192 as the MeSH term (hereinafter referred to as gene classification term) for classifying the lowermost Descriptor term among the lower level Descriptor terms into genes.
  • G is set in the term DB192 with the SCR term mapped to the Descriptor term classified as a gene as the gene classification term.
  • MeSH terms [C01] to [C04] and [C05] to [C23] and the lower-layer Descriptor terms are classified as “MeSH terms” (hereinafter referred to as “disease classification terms”). "Is set.
  • C is set in the term DB192 as the MeSH term (hereinafter referred to as the compound classification term) that classifies the compound. . Also, it is mapped to the Descrip tor term classified as a compound! /, And RegistryNumber is recorded !, and “C” is set in the term DB192 with the SCR term as the compound classification term.
  • FIG. 10 is a diagram showing a screen (search result screen) on which the search result display unit 105 in Embodiment 1 displays the search result.
  • (1) is the main menu
  • (2) is the query
  • (3) is the submenu
  • (4) is the result
  • (5) is the history.
  • the main menu (1) is a button for switching the main function.
  • the main function includes the document search function (Search), synonym search function (Dictionary), alert function (Alert), environment The setting function (Customize), help function (Help), and logout function (Logout) are shown.
  • the document search unit 100 uses the document search unit 110. Execute the literature search.
  • the document search apparatus 100 executes the synonym search unit 111 to perform a synonym search, and if the pressed button is an alert function or an environment setting function, the user environment Execute the setting unit 101 to set the environment.
  • the pressed button is the help function
  • the document search device 100 displays the description information of the document search device 100 on the CRT display device 901 (an example of a display device). It is assumed that the description information of the document search device 100 is stored in a storage device included in the document search device 100. Further, if the pressed button is a logout function, the document search device 100 does not execute subsequent document search processing.
  • the query (2) is a text box for inputting a search condition expression (query, search term or designated term) for accessing MEDLINE and searching for a document.
  • the document search apparatus 100 stores in the storage device the search condition expression that the user inputs and specified in the query (2) from the keyboard 902 (an example of an input device).
  • the submenu (3) is a tab for switching the display method of the document search results.
  • the extraction term classification display GDCT
  • the hit document display Article
  • the extraction term display Show MeSH number display MeSH
  • the document retrieval apparatus 100 executes the display change unit 103 when the user presses the tab of the sub-user (3) using the mouse 903 (an example of an input device), and displays information (display) of the pressed tab.
  • the display of the document search result is changed according to the designation information.
  • the result (4) is a display area showing a document search result by the display method specified in the submenu (3).
  • the history (5) is a display area indicating search transition.
  • the document retrieval apparatus 100 is an example of a document retrieval apparatus that retrieves documents.
  • MEDLINE 200 is an example of a document database that stores a set of document information (for example, all document information recorded in MEDLINE 200).
  • Term DB 192 is a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues, a hierarchical database that stores hierarchical information (MeSH number) of thesaurus rasters, and synonyms for thesaurus rasters. It is an example of a synonym database that stores.
  • the MEDLINE access unit 116 is an example of a search statement input unit that inputs a search statement from an input device and a document information acquisition unit that acquires document information corresponding to the input search statement.
  • the MeSH term extraction unit 121 extracts a document term (for example, MeSH term [including a representative term corresponding to the extraction term) included in the acquired document information (hit document information) and a document term extraction. It is an example of a part.
  • the classification acquisition unit 123 is an example of a classification acquisition unit that acquires the thesaurus raster classification from the classification database (term DB 192).
  • the search result display unit 105 is an example of an output unit that outputs an extraction term (including a representative term and MeSH number corresponding to the extraction term), a classification, a term score, and a search transition to an output device.
  • score calculation unit 124 obtains the acquired document information including the thesaurus raster in the number of acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information.
  • 5 is an example of a time score calculation unit that calculates a score of the ratio of the number of
  • the search transition information generation unit 126 calculates the number of duplicate terms and the number of duplicate terms that calculate the number of the same literature terms as the past literature terms among the literature terms (for example, MeSH terms) extracted this time. It is an example of a part.
  • the display changing unit 103 also corresponds to the hierarchy input by the specified hierarchy input unit for inputting the specification of the thesaurus raster hierarchy, the thesaurus hierarchy acquisition unit for acquiring the hierarchy information of the thesaurus raster, and the specified hierarchy input unit.
  • 5 is an example of a thesaurus raster selection unit that selects a thesaurus raster based on hierarchical information acquired by a thesaurus hierarchy acquisition unit.
  • the document search unit 110 performs a narrow sense search using a thesaurus raster included in the search sentence as a search condition.
  • a search method input unit for inputting a search method specification of a search method and a broad search method using a thesaurus raster including the search statement and a thesaurus raster below the thesaurus raster included in the search statement as search conditions It is.
  • the environment setting information included in the user information includes setting information for selecting either a narrow search or a broad search (referred to below).
  • the document search unit 110 inputs the search method information from the user information DB 191.
  • the representative term acquisition unit 113 is an example of a synonym term acquisition unit that acquires a thesaurus raster having a synonym as a search term and also displays the synonym database power on a display device.
  • the designated term acquisition unit 114 is an example of a search term designation unit that designates a thesaurus raster that uses the displayed thesaurus raster power as a search term.
  • FIG. 11 is a diagram illustrating an appearance of the document search device 100 according to the first embodiment.
  • the document retrieval device 100 includes a system unit 910, a CRT (Cathode Ray Tube) display device 901, a keyboard (KZB) 902, a mouse 903, a compact disc device (CDD) 905, a printer device 906, and a scanner device 907. These are connected by cables.
  • CTR Cathode Ray Tube
  • KZB keyboard
  • CDD compact disc device
  • the document retrieval apparatus 100 is connected to a fax machine 932 and a telephone 931 via a cable, and is connected to the Internet 940 via a local area network (LAN) 942 and a web server 941.
  • the document retrieval apparatus 100 is connected to the MEDLINE database server 200 (hereinafter referred to as MEDLINE200), the OM IM database server 210 (hereinafter referred to as OMIM210), the MeSH database server 220 (hereinafter referred to as MeSH220) via the LAN 942 or the Internet 940. ).
  • FIG. 12 is a hardware configuration diagram of the document search device 100 according to the first embodiment.
  • the document retrieval apparatus 100 includes a CPU (Central Processing Unit: 911) for executing a program 911.
  • CPU911 via ROM 912 ROM913, RAM914, communication board 915, CRT display device 901, K ZB902, mouse 903, FDD (Flexible Disk Drive) 904, magnetic disk device 920, CDD905, printer device 906, scanner device 907 Connected with.
  • the CPU 911 executes the operations of “to part” described below. In other words, CPU911 It performs operations such as output, communication and display.
  • the RAM 914 is an example of a volatile memory.
  • ROM913, FDD904, CDD905, magnetic disk device 920, and optical disk device are examples of nonvolatile memory. These are examples of storage devices, storage devices, or storage units.
  • the OMIM210 and MeSH220 databases may be external databases connected to the Internet 940.
  • the communication board 915 is connected to a fax machine 932, a telephone 931, a LAN 942, and the like.
  • the communication board 915, KZB902, scanner device 907, FDD904, storage device, and the like are examples of an input device, an input device, or an input unit.
  • the user environment setting unit 101, the MEDLINE update information acquisition unit 102, the display change unit 103, and the login unit 104 respectively input environment setting information, update information, display designation information, and login information from the input device.
  • the synonym search unit 111 also inputs a search term and a designated term as input device power.
  • the communication board 915, the CRT display device 901, the storage device, and the like are examples of an output device, an output device, or an output unit.
  • the synonym search unit 111 and the search result display unit 105 output the representative term and the search result to the output device, respectively.
  • the communication board 915 is not limited to the LAN 942, and may be directly connected to the Internet 940 or a WAN (Wide Area Network) such as ISDN.
  • a WAN Wide Area Network
  • the document retrieval apparatus 100 is connected to a WAN such as the Internet 940 or ISDN, and the web server 941 is unnecessary.
  • the magnetic disk device 920 stores an operating system (OS) 921, a window system 922, a program group 923, and a file group 924.
  • the program group 923 is executed by the CPU 911, the OS 921, and the window system 922.
  • the program group 923 stores programs that execute the functions described as “ ⁇ unit”, “ ⁇ DB”, and “ ⁇ storage unit” in the following description of the embodiments. Professional The gram is read and executed by the CPU 911.
  • the arrow portion of the flowchart described in the description of the embodiment described below mainly indicates input / output of data, and for the input / output of the data, the data is the magnetic disk device 920, FD (Flexible Disk Cartridge). , Optical disc, CD (compact disc), MD (mini disc), DVD (Digital Versatile Disc), and other recording media. Alternatively, it is transmitted through a signal line or other transmission medium.
  • firmware stored in the ROM 913 is realized by firmware stored in the ROM 913. I do not care. Alternatively, it may be implemented by software alone, hardware alone, a combination of software and hardware, or a combination of firmware.
  • a program for carrying out the embodiment described below may be stored by using a recording device using a magnetic disk device 920, FD, an optical disk, a CD, an MD, a DVD, or other recording media.
  • FIG. 13 is a flowchart showing a document search process performed by the document search apparatus 100 according to the first embodiment.
  • the login unit 104 performs a login process to the user's document search apparatus 100 (S10 Do)
  • the document search unit 110 performs a document search process for accessing the MEDLINE 200 and acquiring MEDLINE search information in accordance with a user instruction (S102).
  • the search result generation unit 120 performs search result generation processing for generating a search result based on the MEDLINE search information acquired by the document search unit 110 (S103).
  • the search result display unit 105 performs a search result display process for displaying the search result on the CRT display device 901 (an example of a display device) based on the search result information generated by the search result generation unit 120 (S 104). .
  • the document search unit 110 repeats S102 to S104 according to the user's instruction, and supports the narrow search.
  • the user can acquire the refined useful literature information by specifying the hit literature display (Article) in the submenu (3) of the search result screen.
  • the display change unit 103 acquires the MEDLINE result information from the search result storage unit 193, and the search result display unit 105 displays the hit document information indicated by the MEDLINE result information as the narrowed down useful document information. Display on the device.
  • the login unit 104 displays a text box for entering the login ID and password on the CRT display device 901 (an example of a display device), and prompts the user to enter the mouth ID and password. Then, the login unit 104 inputs the login ID and password designated by the user also inputting the keyboard 902 (an example of an input device). The login unit 104 outputs the input login ID and accesses the user information DB 191.
  • the user information DB 191 outputs the search result of the user information using the login ID output from the login unit 104 as a search key.
  • the user information DB 191 stores a registered login ID of a registered user as user information, and stores a registration password, environment setting information of the user, etc.
  • the user information DB 191 searches for user information whose registered login ID matches the search key, and determines whether there is user information whose registered login ID matches the search key and user information whose registered login ID matches the search key. Output as search results.
  • the login unit 104 displays the entered user-specified password. The password is compared with the password in the user information included in the search result. Then, if the entered user-specified password matches the password in the user information included in the search results, the user's access to the document search device 100 is permitted, and the documents based on the subsequent user's specifications Perform search processing.
  • the login unit 104 does not match the search result output from the user information DB 191 indicating that “the user information is not present” or the input password specified by the user and the password in the user information included in the search result. If the user does not permit access to the document search device 100, the subsequent document search process is not executed.
  • the login unit 104 permits the user to access the document search device 100, and the user who is permitted to access the document search function (1) in the search result screen (1)
  • the MEDLIN E access unit 116 of the document search unit 110 also inputs a query specified by the user with input device capabilities. For example, a user inputs a query entered in the query (2) on the search result screen from the keyboard 902 (an example of an input device). For example, a file in which a query specified by the user is set is stored in a storage device (input Enter the query from the file obtained from the device example). Then, MEDLINE access unit 116 accesses MEDLINE 200 with the input query, acquires MEDLINE result information from MEDLINE 200, and stores the acquired MEDLINE result information in search result storage unit 193.
  • the MeSH term extraction unit 121 of the search result generation unit 120 performs MeSH term extraction processing for extracting the MeSH term from the hit document information included in the MEDLINE result information acquired by the document search unit 110.
  • the additional information generation unit 122 of the search result generation unit 120 performs additional information generation processing for generating additional information for each MeSH term extracted by the MeSH term extraction unit 121.
  • FIG. 14 is a flowchart showing a flow of MeSH term extraction processing of the MeSH term extraction unit 121 in the first embodiment.
  • MeSH term extraction unit in search result generation processing (S 103) of search result generation unit 120 The 121 MeSH term extraction process is described below with reference to FIG.
  • the MeSH term extraction unit 121 acquires hit document information from the search result storage unit 193 (S201).
  • the MeSH term extraction unit 121 determines whether or not the acquired all hit document information has been processed in order to perform the processing of S203 to S205 for each acquired hit document information (S202).
  • the MeSH term extraction unit 121 extracts the target hit document information (i.e., the unprocessed hit document information) force MeSH term as an extraction term (S203).
  • the MeSH term extraction unit 121 performs the process of S205 for each extracted term, the extracted total extracted terms are subjected to the processed power determination (S204).
  • the MeSH term extraction unit 121 When there is an unprocessed extraction term, the MeSH term extraction unit 121 adds a representative term corresponding to the extraction term to be processed (an unprocessed extraction term! /, Or a deviation) and adds an extraction term. Generate a list (list information of representative terms corresponding to the extracted terms). In addition, the number of appearances of the representative terms corresponding to the extracted terms is set in the extracted term list. The number of appearances set in this extracted term list indicates the number of documents related to the MeSH term among the hit documents (hereinafter referred to as the number of documents per term in the hit document). Therefore, the MeSH term extraction unit 121 outputs the extraction term to be processed and accesses the term DB 192.
  • the term DB192 that stores MeSH term information uses the extracted term output from the MeSH term extraction unit 121 as a search key, and the MeSH term information (entry) whose representative term or synonymous term matches the search key. Searches and outputs MeSH term information whose representative term or synonymous term matches the search key as a search result.
  • the MeSH term extraction unit 121 compares the representative term indicated by the MeSH term information output by the term DB 192 (hereinafter referred to as the representative term of the processing target) with the representative term in the generated extracted term list, and represents the representative of the processing target. The power is determined if the same representative term as the term already exists in the extracted term list.
  • the MeSH term extraction unit 121 adds the representative term to be processed to the extracted term list and associates it with the added representative term of the processing target. Set the number of documents per medium term “1”. Also If there is a representative term that is the same as the processing target, the MeSH term extraction unit 121 does not add the processing target representative term to the extraction term list, and stores hit documents stored in association with the same representative term as the processing target. Increment the number of documents per medium term. In addition, when the term DB192 2 outputs multiple MeSH term information, the MeSH term extraction unit 121 determines the number of documents per term in the reference term and hit for the representative term indicated by each MeSH term information. And are set in the extraction term list. Then, S204 is processed again (S205).
  • the MeSH term extraction unit 121 ends the MeSH term extraction process.
  • FIG. 15 is a flowchart showing a flow of additional information generation processing of the additional information generation unit 122 in the first embodiment.
  • the additional information generating unit 122 processes all the representative terms set in the extracted term list in order to perform the processing of S302 to S304 for all the representative terms set in the extracted term list generated by the MeSH term extracting unit 121. It is determined whether or not (S301).
  • the additional information generation unit 122 accesses the term DB 192 by outputting a representative term to be processed (any of the unprocessed representative terms).
  • the term DB 192 that stores MeSH term information searches the MeSH term information (entry) for which the representative term matches the search key, using the representative term output from the additional information generating unit 122 as a search key. , MeSH term information whose representative term matches the search key is output as the search result.
  • the additional information generating unit 122 sets the description of the term indicated by the MeSH term information output from the term DB 192 in the extracted term list in association with the representative term to be processed (S302).
  • the classification acquisition unit 123 of the additional information generation unit 122 performs MeS output from the term DB 192.
  • a classification acquisition process is performed in which the classification indicated by the H-term information (an example of additional information) is set in the extracted term list in association with the representative term to be processed (S303).
  • the term score calculation unit 124 of the additional information generation unit 122 calculates a term score of the representative term to be processed, and extracts the calculated term score (an example of additional information) in association with the representative term of the processing target.
  • a term score calculation process set in the term list is performed (S304).
  • the additional information generation unit 122 stores the extracted term list in which the additional information is set in the search result storage unit 193 as search result information (S305).
  • FIG. 16 is a flowchart showing the flow of term score calculation processing of the term score calculation unit 124 in the first embodiment.
  • the term score calculation unit 124 calculates the term score S (t) of the representative term t to be processed using the following formula (S401).
  • the total number of MEDLINE documents and the number of hit documents used in the above formula also acquire the MEDLINE result information power stored in the search result storage unit 193.
  • the number of documents per term in all documents is obtained from MeSH term information output by the term DB 192 in S302. Also hit The number of documents for each term in the document is obtained from the extracted term list.
  • the odds are calculated using the density (pt) of the representative terms to be processed in the hit literature information as the standard distribution and the density (qt) of the representative terms to be processed in all the literature information as the measured distribution.
  • the ratio (Lod [Log Of Odds]) is calculated as the term score.
  • “odds” is an example of scoring, and the term score may be calculated by other scoring.
  • the term score calculation unit 124 of the additional information generation unit 122 adjusts the provisionally calculated term score (S402 to S404).
  • the number of hit documents is compared with a specific first threshold value
  • the number of documents per term in the hit documents of the representative term to be processed is compared with a specific second threshold value.
  • the term score of the representative term of the processing target whose number of hit documents is less than the first threshold and the number of documents per term in the reference document is less than the second threshold is set to “one”. Change to “ ⁇ ”.
  • the first threshold for the number of hit documents and the second threshold for the number of documents for each term in the hit documents are preliminarily stored in the storage device of the document search apparatus 100.
  • a plurality of first threshold values and second threshold values may be provided, and the term score may be calculated according to a plurality of conditions.
  • a third threshold is set for the number of documents per term in hit documents, and in addition to the above conditions, the number of documents per term in the hit documents of the representative term to be processed when the number of hit documents is greater than or equal to the first threshold If is less than the third threshold, the term score of the representative term to be processed may be changed to “one ⁇ ”.
  • the term core adjustment is not limited to setting the term score to “one ⁇ ” when the above conditions are met. For example, the term score when the above conditions are met can be reduced or subtracted from a specific value, or the term score can be increased at a certain rate when the above conditions are not met. Or add a specific value.
  • the term score calculated in S401 is high because the number of hit documents is small. A score does not necessarily indicate that the MeSH term is important. In particular, the MeSH term, which has a small number of hit documents and a small number of documents in each hit document, is not considered important. Therefore, in S402 to S404, the MeS considered to be unimportant
  • the term score calculation unit 124 sets the term score in the extracted term list in association with the representative term to be processed (S405).
  • the search result display unit 105 extracts the term list (search result information from the search result storage unit 193).
  • FIG. 17 is a diagram showing a result (4) of the search result screen according to the first embodiment.
  • [1] is a check box
  • [2] is a term score (Score)
  • [3] is the number of documents in each hit document (Frequency)
  • [4] is a representative term (Heading)
  • [5] is a description
  • [6] is an OMIM icon
  • [7] is a PubMed icon.
  • the search result display unit 105 displays the above items ([1] to [7]) for each extracted term according to the classification of each extracted term extracted by the classification obtaining unit 123 and set in the extraction list.
  • the Fig. 17 shows the display of extraction terms classified into genes (Gene).
  • the check box [1] is a check box for selecting a MeSH term to be added to the tale as a search condition for the narrow search.
  • the query generation unit 115 performs an AND (logical product) condition, an OR (logical sum) condition, or a NOT (negative) condition on the representative term [4] checked by the user using a mouse 903 (an example of an input device). Generate the query added in.
  • the term score (Score) [2] indicates the term score
  • the number of documents per term in the hit document (Freq uency) [3] indicates the number of documents per term in the hit document
  • the representative term (Heading) [4] Indicates the representative term
  • Description [5] describes the representative term [4].
  • the OMIM icon [6] is an OMI that uses the MeSH term (representative term [4]) as a search condition.
  • PubMed icon [7] is the icon that designates the display of the search result page of PubMed that uses the Me SH term (representative term [4]) as a search condition.
  • the query generation unit 115 responds to the current query used to access MEDLINE. Generate a query that is added by AND condition.
  • the MEDLINE access unit 116 generates a URL (including a query) for accessing the search page (OMIM or PubMed) corresponding to the pressed icon, accesses it via the Internet, and generates the query generation unit 115.
  • the search result page information corresponding to the query is obtained via the Internet.
  • the search result display unit 105 displays a search result page provided by OMIM or PubMed on the Internet.
  • the user uses the mouse 903 (an example of an input device) to obtain a term score [2] (Score), the number of documents per term in the hit document [3] (
  • the display changing unit 103 displays information on the pressed items (term score, number of documents per term in hit documents, representative term) (display designation)
  • the search result information is sorted according to the information), and the search result display unit 105 changes the display of the search result based on the sorted search result information (display change information).
  • Embodiment 1 it is possible to output additional information (classification, term score) together with extraction terms (which also means representative terms corresponding to extraction terms) as information for narrowing down literature information. explained.
  • the extraction terms can be added to the search condition by adding the additional information to display the extraction terms and specifying the check box [1] in the result (4) on the search result screen. I explained what I can do. In other words, it was explained that the search condition can be easily added to narrow down the literature information.
  • MeSH terms are classified according to the classification index number (specific MeSH number).
  • the MeSH term has a low term score in the hit document with a small number of hit documents and the number of documents per term is small.
  • the MeSH term extraction unit 121 generates an extraction term list indicating the extraction terms themselves, not the representative terms corresponding to the extraction terms, and the additional information generation unit 122 extracts the extraction terms themselves. Additional information may be generated based on the term list. That is, the extraction term itself and its additional information may be output. However, since the representative term uniquely identifies the MeSH term, the representative term corresponding to the extracted term and its additional information are output, so that a more appropriate MeSH term can be narrowed down to the user as a search condition. Can be specified.
  • the document information may not be the document information recorded in MEDLINE 200, and the document information may not be the information of the medical-related document. .
  • the document term may not be a MeSH term, and the document term may not be information on medical terms! /.
  • the classification described in the first embodiment may be designated as a search condition for the category.
  • the MeSH terms included in the specified query are representative terms or synonymous terms, and the query generation unit 115 acquires the representative terms of the specific classification specified in the query from the term DB 192, The query is generated by replacing the obtained representative term with the MeSH term.
  • the "term description" setting (S302) and the “classification” setting (S303) in the extracted term list are performed when the extracted term list is generated (S205).
  • Well ... This reduces access to the term DB192 and reduces processing time.
  • the MeSH number acquisition unit of the additional information generation unit 122 performs the MeSH number acquisition process and adds the MeSH number corresponding to the representative term of the extracted term.
  • the search result is output as information.
  • the MeSH number acquisition process the MeSH number (example of additional information) indicated by the MeSH term information output by the term DB 192 is set in the extracted term list in association with the representative term to be processed.
  • Other processes and configurations of the document retrieval apparatus 100 in the second embodiment are the same as those in the first embodiment.
  • FIG. 18 is a diagram showing a result (4) of the search result screen in the second embodiment.
  • [0112] is a list box for selecting a hierarchy of MeSH terms to be displayed, [2] is a MeSH number, and [3] is a display of MeSH search result page Me SH Icon.
  • Other screen configurations are the same as those in the first embodiment.
  • the display changing unit 103 uses the mouse 903 (an example of an input device) to specify the MeSH term hierarchy to be displayed in the list box [1]. Outputs the extraction term list (search result information) for the extraction terms corresponding to (display change information).
  • the search result display unit 105 changes the display of the search result based on the search result information (display change information) output from the display change unit 103.
  • the display changing unit 103 compares the MeSH number of each extraction term set in the extraction term list with the specified MeSH number by the forward match, and the forward match is obtained. Output a list of extracted terms by extraction terms .
  • the extracted terms with the specified MeSH number that matches the prefix indicate the extracted terms in the specified hierarchy and lower layers.
  • the display change unit 103 may output an extraction term list for extraction terms that completely match the specified MeSH number.
  • the extracted terms that exactly match the specified MeSH number indicate only the extracted terms in the specified hierarchy.
  • the display change unit 103 acquires the MeSH number of the specified MeSH term from the term DB192 or the extracted term list, and the MeSH number is specified in the list box.
  • the extracted term list corresponding to the specified hierarchy is output in the same way as.
  • the MEDLINE access unit 116 uses the MeSH number [2] as a query (including the query). ), Access the search page of the MeSH database on the Internet, and obtain the search result page information. Then, the search result display unit 105 displays a search result page provided by the MeSH database on the Internet based on the information acquired by the MEDLINE access unit 116! /.
  • MeSH terms in a specified hierarchy among MeSH terms included in hit document information can be output as information for narrowing down medical document information.
  • Embodiment 2 it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported.
  • the search transition information generation unit 126 performs a search transition information generation process and generates a search result for the search transition.
  • the search result display unit 105 acquires the search transition information (search result information) generated by the search transition information generation unit 126 from the search result storage unit 193, and performs the search described below. Display the result screen history (5).
  • the search result generation unit 120 stores the generated search result information in the search result storage unit 193 in association with the MEDLINE result information, the search execution date and time, and the query used for the search.
  • the result information is combined with MEDLINE result information and search result information.
  • the MeSH term extraction unit 121 performs MeSH term extraction processing.
  • the search transition information generation unit 126 of the search result generation unit 120 performs search transition information generation processing for generating search transition information based on the current result information and past result information.
  • FIG. 19 is a flowchart showing the flow of search transition information generation processing of the search transition information generation unit 126 in the third embodiment.
  • the search transition information generation process of the search transition information generation unit 126 in the search result generation process (S103) of the search result generation unit 120 will be described below with reference to FIG.
  • the search transition information generation unit 126 acquires past (previous) result information from the search result storage unit 193. Also, the current MEDLINE result information is acquired (S501).
  • the search transition information generation unit 126 determines whether or not the current all hit document information has been processed in order to perform the process of S503 for the current all hit document information (included in the MEDLINE result information) (S502).
  • the current hit document information to be processed (unprocessed)
  • the hit document information of the process is compared with the past hit document information, and the current hit document information to be processed that overlaps the past hit document information of the misalignment force is added to generate a duplicate document list .
  • S502 is processed again (S503).
  • the search transition information generator 126 supports the extracted terms indicated by the extracted term list included in the search result information In order to perform the processing of S505 for all the representative terms of this time, the processing power of all the representative terms of this time is judged (S504).
  • the current representative term to be processed (the power of any unprocessed representative term) is compared with the representative term corresponding to the past extracted term. Add a representative term for the current process that overlaps with a term to generate a duplicate term list. Then, S504 is processed again (S505).
  • the search transition information generation unit 126 searches the generated duplicate document list and duplicate term list as search result information. Store in the result storage unit 193. At this time, the search transition information generation unit 126 stores the search result information in association with the MEDLINE result information, the search execution date and time, and the query used for the search (S506).
  • 20 and 21 are conceptual diagrams of search transition information in the third embodiment.
  • the search transition information generation unit 126 As a result of the above-described search transition information generation processing (S501 to S506), the search transition information generation unit 126 generates the current result information (result information 1) and the previous result information (result information 2) as shown in FIG. Duplicate information (hit documents, representative terms corresponding to extracted terms) is generated as search transition information.
  • the search transition information generation unit 126 searches for information that overlaps between the current result information (result information 1) and a plurality of past result information (result information 2, result information 3). It may be generated as transition information.
  • the search transition information generation unit 126 treats the previously generated search transition information as past result information in the search transition information generation process (S501 to S506), and generates the current search transition information. .
  • FIG. 22 is a diagram showing a history (5) of the search result screen in the third embodiment.
  • [1] is the clear button
  • [2] is the history number (No)
  • [3] is the execution date (Time)
  • ⁇ [4] is the query (Query)
  • ⁇ [5] is the number of hit documents (Article — Num)
  • [6] is the number of overlapping documents (Article—Overlap)
  • [7] is the number of extracted terms (Term—Num)
  • [8] is the number of overlapping terms (Term-Overlap).
  • the clear button [1] is a button for designating deletion of past result information.
  • the document search apparatus 100 deletes past result information stored in the search result storage unit 193.
  • History number indicates a number for identifying current and past searches.
  • the search result display unit 105 adds history numbers in ascending order from the latest execution date and time.
  • Execution date and time [3] indicates the search execution date and time
  • query [4] indicates the query used in each search
  • number of hit documents [5] indicates the number of hit documents in each search
  • overlapping documents The number [6] indicates the number of documents in the duplicate document list
  • the number of extracted terms [7] indicates the number of representative terms corresponding to the extracted terms
  • the number of overlap terms [8] indicates the number of representative terms in the duplicate term list .
  • the number of the same document terms (or documents) as the document terms (or documents) extracted in the past among the currently extracted document terms (or documents) is output as the search transition information. I explained what I can do. In other words, if the number of document terms (or documents) indicated by the search transition information is extremely small compared to the number of document terms (or documents) extracted this time, a narrowed-down document search is performed with an incorrect search condition. If the number of document terms (or documents) indicated by the search transition information is reasonably small relative to the number of document terms (or documents) extracted this time, a refined document search is performed using effective search conditions. It is possible to make the user recognize the status of search transitions such as thought to have been lost. Thus, the narrowed-down document search for obtaining useful document information can be supported by the third embodiment.
  • Embodiment 3 In addition, in Embodiment 3 described above, it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported. [0130] Embodiment 4.
  • FIG. 23 is a flowchart showing the flow of document search processing of document search unit 110 in the fourth embodiment.
  • the document search process (S102) of the search result generation unit 120 when the synonym search function (Dictionary) is designated by the user in the main menu (1) of the search result screen will be described below with reference to FIG.
  • the synonym search unit 111 outputs a representative term corresponding to the term specified by the user (search term) and is used for the search (specified term).
  • a synonym search process is performed to allow the user to specify (S601 to S603), and then the query generation unit 115 performs a query generation process to generate a query using the specified term as an AND condition or an OR condition (S604), and Then, the MEDLINE access unit 116 performs MEDLINE access processing for accessing the MEDLINE 200 using the query generated by the query generation unit 115 and acquiring hit document information (S605).
  • the search term input unit 112 of the synonym search unit 111 also inputs the input device power using the term specified by the user as the search term. For example, a user inputs an arbitrary term from the keyboard 90 2 (an example of an input device), and for example, a file in which a term specified by the user is set is obtained from a storage device (an example of an input device) Enter a custom term.
  • the number of terms can be one or more.
  • the representative term acquisition unit 113 of the synonym search unit 111 accesses the term DB 192 with the search term input by the search term input unit 112, and acquires a representative term corresponding to the search term.
  • the term DB 192 searches for MeSH term information (entry) in which the search term and the representative term or the synonym term match the search term, and the search term.
  • the representative term of the entry in which the search term and the synonymous term or the synonymous term match the search term is output to the representative term acquisition unit 113.
  • the designated term acquisition unit 114 of the synonym search unit 111 outputs the representative term acquired by the representative term acquisition unit 113 to the search result display unit 105, and the search result display unit 10
  • FIG. 24 is a diagram showing a result (4) of the search result screen in the fourth embodiment.
  • [1] is a search term (Input)
  • [2] is a check box
  • [3] is a representative term (Heading).
  • Search term [1] indicates a search term
  • representative term [3] indicates a representative term.
  • Check box [2] is a check box for selecting a MeSH term (specified term) as a search condition.
  • the specified term acquisition unit 114 acquires the representative term [3] checked by the user in the check box using the mouse 903 (an example of an input device) as the specified term.
  • Embodiment 4 described above it is possible to acquire a representative term having a user-specified term as a synonymous term, and to search for literature information corresponding to the representative term specified by the user among the acquired representative terms.
  • multiple synonymous terms are commonly used as MeSH terms, it is not always possible to acquire all necessary document information using search terms specified by the user. That is, according to the fourth embodiment, it is possible to allow the user to specify a more appropriate Me SH term as a search condition for a narrow search, and it is possible to support a narrowed document search for acquiring useful document information. This is particularly effective when you want to search for literature information using a large number of terms as search conditions, such as when you want to examine literature information for many genes expressed by a microarray.
  • Embodiment 4 described above it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported.
  • the broad search method is a search method that uses a MeSH term included in a search sentence and a MeSH term in a lower layer of the MeSH term included in the search sentence as search conditions.
  • the narrow-sense search method is a search method that uses MeSH terms included in the search text as search conditions.
  • the search method is stored in the user information DB 191 as user information. Then, when accessing the MEDLINE 200, the document search unit 110 performs the following processing according to the search method stored in the user information DB 191.
  • the query generation unit 115 uses the MeSH term in the lower layer of the MeSH term as a condition. Generate the added query.
  • the MEDLINE access unit 116 accesses the MEDLINE 200 with a query in which the MeSH term in the lower layer of the MeSH term generated by the query generation unit 115 is added as a condition, and acquires MEDLINE result information.
  • the query generation unit 115 outputs each term in the query and accesses the term DB 192.
  • the term DB 192 searches for the MeSH term number of the MeSH term that matches each term output by the query generation unit 115. Then, the term DB 192 searches for the MeSH term number that matches the MeSH term number of the MeSH term that matches each term output from the query generation unit 115 and matches the MeSH term of the MeSH number that matches the prefix to the query generation unit 115. Output. Then, the query generation unit 115 generates a query in which the MeSH term output from the term DB 192 is set with an OR condition.
  • the MEDLINE access unit 116 accesses the MEDLINE 200 with one query generated by the query generation unit 115 and acquires MED LINE result information.
  • the MEDLINE update information acquisition unit 102 periodically accesses the NLM FTP site to acquire MEDLINE200 update information, and based on the update information, all documents of each MeSH term information (entry) in the term DB192 Update the number of documents for each medium term.
  • the MeSH term extraction unit 121 treats the document information indicated by the update information as a hit document, and generates an extracted term list in the same manner as in the first embodiment.
  • the MEDLINE update information acquisition unit 102 outputs the representative terms set in the extracted term list and the number of documents in each hit document to the term DB 192.
  • the term DB 192 is the same as the representative term output by the MEDLINE update information acquisition unit 102.
  • the number of documents per term in the hit document output from the MEDLINE update information acquisition unit 102 is added to the number of documents per term in all documents stored in association with the representative term.
  • the MEDLINE update information acquisition unit 102 updates the number of documents per term in all documents in each MeSH term information (entry) of the term DB 192 based on the update information.
  • the MEDLINE update information acquisition unit 102 notifies the user about the update of the MEDLINE 200 document information.
  • the user information DB 191 stores the user's mail address as user information
  • the MEDLINE update information acquisition unit 102 acquires the mail address from the user information DB 191 and sends it to the mail address acquired. Send an email notifying you of updates to LINE200 bibliographic information.
  • the query determined by the user may be used as user information, and the MEDLINE update information acquisition unit 102 may update the MEDLINE200 to access the MEDLINE200 by querying the user information and notify the user of the search result.
  • the user environment setting unit 101 sets an environment setting such as an e-mail address, user-defined query, term score threshold, MeSH term hierarchy list, and search method on the CRT display device 901 (an example of the display device display device). Display the check box and text box for specifying information and password, and update the user information in the user information DB 191 with the information specified by the user! /.
  • Embodiment 6 described above it has been explained that it is possible to cope with the update of the MEDLINE200 document information. That is, even if new document information is added to MEDLINE 200, the term score described in the first embodiment can be calculated.
  • the notification destination of MEDLINE 200 the territory for notifying search results, the threshold value for adjusting the term score described in the first embodiment, and the list box described in the second embodiment are set. It has been explained that the hierarchy of MeSH terms, the search method described in Embodiment 5 and the password for login can be changed. As a result, user convenience can be improved.
  • FIG. 1 is a configuration diagram of a document search apparatus 100 according to Embodiment 1.
  • FIG. 2 is a configuration diagram of a document search unit 110 in the first embodiment.
  • FIG. 3 is a configuration diagram of a search result generation unit 120 according to Embodiment 1.
  • FIG. 4 Term of MeSH term information managed and stored in term DB192 in Embodiment 1. The figure which shows a bull.
  • FIG. 10 is a diagram showing a screen (search result screen) on which search result display section 105 in Embodiment 1 displays a search result.
  • FIG. 11 is a diagram showing an appearance of the document search device 100 according to the first embodiment.
  • FIG. 13 is a flowchart showing the flow of a document search process of the document search apparatus 100 in the first embodiment.
  • FIG. 14 is a flowchart showing a flow of MeSH term extraction processing of the MeSH term extraction unit 121 in the first embodiment.
  • FIG. 16 is a flowchart showing the flow of term score calculation processing of the term score calculation unit 124 in the first embodiment.
  • FIG. 17 is a diagram showing a result (4) of the search result screen in the first embodiment.
  • FIG. 18 shows a result (4) of the search result screen in the second embodiment.
  • FIG. 19 A flowchart showing the flow of search transition information generation processing of the search transition information generation unit 126 in the third embodiment.
  • FIG. 22 is a diagram showing a history (5) of a search result screen in the third embodiment.
  • FIG. 23 is a flowchart showing the flow of a document search process of the document search unit 110 in the fourth embodiment.
  • FIG. 24 is a diagram showing a result (4) of the search result screen in the fourth embodiment. Explanation of symbols
  • 100 document search device 101 user environment setting unit, 102 MEDLINE update information acquisition unit, 103 display change unit, 104 login unit, 105 search result display unit, 110 document search unit, 111 synonym search unit, 112 search term input unit, 113 Representative term acquisition unit, 114 Designated term acquisition unit, 115 Query generation unit, 116 MEDLINE access unit, 120 Search result generation unit, 121 MeSH term extraction unit, 122 Additional information generation unit, 123 Classification acquisition unit, 124 Term score calculation , 125 MeSH number acquisition unit, 126 search transition information generation unit, 191 user information DB, 192 term DB, 193 search result storage unit, 200 MED LINE, 210 OMIM, 220 MeSH, 901 CRT display device, 902 ⁇ / ⁇ , 903 mouse, 904 FDD, 905 CDD, 906 printer device, 907 scanner device, 910 system unit, 911 CPU, 912 node, 913 ROM, 914 RAM, 915 communication board, 920 magnetic disk device, 921 OS, 9

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In order to perform narrowed retrieval of a plenty of retrieval results, it is possible to output a document term in the document information, a class to which a document term belongs, and a score indicating importance of a document term as information. A document retriever (110) accesses a document database (200) with an inputted query to acquire document information. A retrieval result generator (120) extracts a document term from the document information and generates a class and a score of the document term. A retrieval result display (105) displays the extracted document term together with its class and score. The class is based on a hierarchical structure indicating the document term. The score is high when the appearance frequency is low in the whole document information and the appearance frequency is high in the hit document information.

Description

明 細 書  Specification
文献検索装置、 方法及びプログラム  Literature retrieval apparatus, method and program
技術分野  Technical field
[0001] 本発明は、文献検索を支援する文献検索装置、文献検索方法および文献検索プ ログラムに関するものである。  The present invention relates to a document search device, a document search method, and a document search program that support document search.
背景技術  Background art
[0002] 従来の文献検索システムでは、ヒットした文献を表示するにとどまるため、再検索に 使用するキーワードはその都度ユーザが発案して追加していた。つまり、ヒットした文 献を絞り込むために行う再検索時に、絞り込みに適したキーワードをユーザ自身が考 え付かねばならな力つた。このため、従来の文献検索システムでの絞り込み検索は非 常に困難であった。また、簡単なキーワードを用いて再検索を行ってしまうと、膨大な 数 (例えば 1万件以上)の文献力 sヒットしてしまい、その膨大な数の文献を示す検索結 果カも重要な文献を選択して文献中の知識を得るということは困難なことである。  [0002] In the conventional document search system, the hit documents are only displayed, and the keyword used for the re-search is invented and added by the user each time. In other words, at the time of re-searching to narrow down the hit documents, the user had to come up with keywords suitable for narrowing down. For this reason, it has been very difficult to narrow down the search by the conventional document search system. In addition, if a simple search is performed using a simple keyword, a huge number of documents (for example, 10,000 or more) will be hit, and the search result that indicates the large number of documents will also be important. It is difficult to select a document and obtain knowledge in the document.
[0003] 例えば、従来の医学文献検索システムは絞り込み検索が困難なシステムの一例で あり、そのシステムは以下のようなものである。  [0003] For example, a conventional medical literature search system is an example of a system in which a narrow search is difficult, and the system is as follows.
医学文献データベースには、米国国立医学図書館(NLM : National Library o f Medicine)の提供する MEDLINE (MEDlars onLINE)があり、医学関連の文 献情報が 1000万件以上も収録されている。この MEDLINEにアクセスするために P ubMedと!、う文献検索システムが提供され、ユーザは PubMedにアクセスするため のインタフェースを有するウェブページにアクセスして MEDLINEに収録された医学 文献情報の検索を行う。  The medical literature database includes MEDLINE (MEDlars on LINE) provided by the National Library of Medicine (NLM), which contains more than 10 million medical-related literature information. In order to access MEDLINE, PubMed! And a document search system are provided, and users access a web page with an interface to access PubMed and search medical document information recorded in MEDLINE.
MEDLINEの収録する文献情報には、その文献に関連の深い医学用語 (MeSH ターム: Medical Subject Headings)、タンパク質'ィ匕合物の名称または特性を表 す用語(Substance)、 RegistryNumber等が含まれ、検索する際のキーとなる。ま た、 MeSHタームや Substanceの情報は、 MEDLINEと同様に PubMedでアクセス 可能な MeSHデータベースに収録され、 MeSHデータベースは、 Descriptor Qua rifierゝ SCR (Supplementary Concept Recordsリという 3つのデータベースを有 する。 MEDLINE's bibliographic information includes medical terms (MeSH term: Medical Subject Headings), terminology or name of protein's compound (Substance), RegistryNumber, etc. It becomes a key when searching. In addition, MeSH terms and substance information are recorded in the MeSH database accessible by PubMed, just like MEDLINE. The MeSH database has three databases called Descriptor Quarifier SCR (Supplementary Concept Records). To do.
Descriptorデータベースは、各 MeSHターム (Descriptorターム)について、 Me SHタームを一意に識別する名称である Heading、 Headingのシノニム、階層構造で MeSHタームを識別する TreeNumber、 MeSHタームの説明などの情報を記憶し、 SCRは Descriptorデータベースに記憶される MeSHタームと対応付けて Substanc eのターム(SCRターム)の情報を記憶する。  For each MeSH term (Descriptor term), the Descriptor database stores information such as Heading, a synonym for Heading that is a name that uniquely identifies the Me SH term, a TreeNumber that identifies the MeSH term in a hierarchical structure, and a description of the MeSH term. The SCR stores information on the terms of Substane (SCR terms) in association with MeSH terms stored in the Descriptor database.
医学文献検索システムで利用されるものにはその他に、ユーザがウェブページから 遺伝子と疾患に関する情報にアクセス可能な OMIM (Online Mendelian Inheri tance in Man)データベースや、化学物質を識別する RegistryNumber (CAS 登録番号)を含む化学に関する情報を記憶する CAS (Chemical Abstracts Ser vice)データベースなどがめる。  In addition to those used in the medical literature search system, the OMIM (Online Mendelian Inheritance in Man) database that allows users to access information on genes and diseases from a web page, and RegistryNumber (CAS registration number) that identifies chemical substances ) Including chemical information database (CAS) that stores chemical information.
特許文献 1:特開 2003— 044481号公報  Patent Document 1: Japanese Patent Laid-Open No. 2003-044481
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0004] 本発明は、上記の課題をふまえ、例えば、多量な文献情報を対象とする文献検索 において、多量にヒットする文献情報から絞り込み、有用な文献情報を取得するため に文献検索を支援すると共に、分子、疾患などの関連に関する新たな医学'分子生 物学的発見を促すことを目的とする。 [0004] Based on the above-described problems, the present invention, for example, in document search for a large amount of document information, narrows down a large amount of document information and supports document search to acquire useful document information. At the same time, it aims to promote new medical 'molecular biology discovery related to the relationship between molecules and diseases.
課題を解決するための手段  Means for solving the problem
[0005] 本発明の文献検索装置は、シソーラスタームを含む文献情報の集合を記憶する文 献データベースと各シソーラスタームを遺伝子と疾患と化合物と組織との分類に対応 付けて記憶する分類データベースとに接続し、検索文を入力機器力 入力する検索 文入力部と、前記検索文入力部の入力した検索文に対応する文献情報を前記文献 データベースから取得する文献情報取得部と、前記文献情報取得部の取得した取 得文献情報に含まれるシソーラスタームを抽出するシソーラスターム抽出部と、前記 シソーラスターム抽出部の抽出したシソーラスタームそれぞれにつ 、て、各シソーラ スタームの分類を前記分類データベースから取得する分類取得部と、前記シソーラ スターム抽出部の抽出したシソーラスタームと前記分類取得部の取得した当該分類 とを対応付けて出力機器に出力する出力部とを備えたことを特徴とする。 [0005] The document retrieval apparatus of the present invention includes a document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues. A search sentence input unit for connecting and inputting a search sentence as input device power, a literature information acquisition part for obtaining literature information corresponding to the search sentence input by the search sentence input unit from the literature database, and the literature information acquisition unit For each of the thesaurus rasters extracted by the thesaurus raster extraction unit for extracting the thesaurus rasters included in the acquired document information obtained from the classification database. The acquisition unit, the thesaurus raster extracted by the thesaurus star extraction unit, and the classification acquisition unit The classification And an output unit that outputs to the output device in association with each other.
[0006] 本発明の文献検索装置は、シソーラスタームを含む文献情報の集合を記憶する文 献データベースと、遺伝子と疾患と化合物と組織との分類でありシソーラスタームの 階層構造に基づく分類に対応付けて各シソーラスタームを記憶する分類データべ一 スとに接続し、検索文を入力機器から入力する検索文入力部と、前記検索文入力部 の入力した検索文に対応する文献情報を前記文献データベースから取得する文献 情報取得部と、前記文献情報取得部の取得した取得文献情報に含まれるシソーラス タームを抽出するシソーラスターム抽出部と、前記シソーラスターム抽出部の抽出し たシソーラスタームそれぞれにつ 、て、各シソーラスタームの分類を前記分類データ ベースから取得する分類取得部と、前記シソーラスターム抽出部の抽出したシソーラ スタームと前記分類取得部の取得した当該分類とを対応付けて出力機器に出力する 出力部とを備えたことを特徴とする。  [0006] The document retrieval apparatus of the present invention is a document database that stores a collection of document information including a thesaurus raster, a classification of genes, diseases, compounds, and tissues, and is associated with a classification based on the hierarchical structure of the thesaurus raster. A search sentence input unit for inputting a search sentence from an input device, and reference information corresponding to the search sentence input by the search sentence input unit. The document information acquisition unit acquired from the document, the thesaurus raster extraction unit for extracting the thesaurus term included in the acquired document information acquired by the document information acquisition unit, and the thesaurus raster extracted by the thesaurus raster extraction unit, respectively A classification acquisition unit for acquiring the classification of each thesaurus raster from the classification database; and an extraction of the thesaurus raster extraction unit. And an output unit that associates the generated thesaurus star with the classification acquired by the classification acquisition unit and outputs the same to an output device.
[0007] 本発明の文献検索装置は、シソーラスタームを含む文献情報の集合を記憶する文 献データベースに接続し、検索文を入力機器力 入力する検索文入力部と、前記検 索文入力部の入力した検索文に対応する文献情報を前記文献データベースから取 得する文献情報取得部と、前記文献情報取得部の取得した取得文献情報に含まれ るシソーラスタームを抽出するシソーラスターム抽出部と、前記シソーラスターム抽出 部の抽出したシソーラスタームそれぞれにつ 、て、文献情報の集合の文献情報の数 における当該シソーラスタームが含まれる文献情報の数の割合に対する、取得文献 情報の数における当該シソーラスタームが含まれる取得文献情報の数の割合のスコ ァを算出するタームスコア算出部と、前記シソーラスターム抽出部の抽出したシソー ラスタームと前記タームスコア算出部の算出した当該スコアとを対応付けて出力機器 に出力する出力部とを備えたことを特徴とする。 [0007] The document search device of the present invention is connected to a document database that stores a set of document information including a thesaurus raster, and includes a search sentence input unit that inputs a search sentence as input device power, and the search sentence input unit. A document information acquisition unit that acquires document information corresponding to the input search sentence from the document database, a thesaurus raster extraction unit that extracts a thesaurus raster included in the acquired document information acquired by the document information acquisition unit, and the thesaurus Each of the thesaurus rasters extracted by the term extraction unit includes the corresponding thesaurus raster in the number of acquired document information with respect to the ratio of the number of document information including the corresponding thesaurus raster in the number of document information in the set of document information. A term score calculation unit that calculates a score of the ratio of the number of acquired document information, and an extraction by the thesaurus raster extraction unit Characterized by comprising an output section for outputting to the output device in association with calculated the score Shiso Rasutamu said term score calculation unit has.
[0008] さらに、取得文献情報の数についての第 1の閾値と当該シソーラスタームが含まれ る取得文献情報の数にっ 、ての第 2の閾値とを記憶する記憶機器を備え、前記ター ムスコア算出部は、前記記憶機器力 第 1の閾値と第 2の閾値とを取得し、第 1の閾 値と第 2の閾値と取得文献情報の数と当該シソーラスタームが含まれる取得文献情 報の数とに基づいて、取得文献情報の数が第 1の閾値未満であり且つ当該シソーラ スタームが含まれる取得文献情報の数が第 2の閾値未満である当該シソーラスター ムのスコアを低く算出することを特徴とする。 [0008] Further, a storage device is provided for storing a first threshold for the number of pieces of acquired document information and a second threshold for the number of pieces of acquired document information including the thesaurus raster, and the term score The calculation unit acquires the first threshold value and the second threshold value of the storage device power, and acquires the first threshold value, the second threshold value, the number of acquired document information, and the acquired document information including the thesaurus raster. Based on the number and the number of acquired document information is less than the first threshold and the thesaurus It is characterized in that the score of the thesolar star whose number of acquired document information including the star is less than the second threshold is calculated low.
[0009] 本発明の文献検索装置は、シソーラスタームを含む文献情報の集合を記憶する文 献データベースに接続し、検索文を入力機器力 入力する検索文入力部と、前記検 索文入力部の入力した検索文に対応する文献情報を前記文献データベースから取 得する文献情報取得部と、前記文献情報取得部の取得した取得文献情報に含まれ るシソーラスタームを抽出し記憶機器に記憶するシソーラスターム抽出部と、前記シ ソーラスターム抽出部が今回抽出したシソーラスタームの内で前記シソーラスターム 抽出部が記憶機器に記憶している過去に抽出したシソーラスタームと同じシソーラス タームの数を算出する重複ターム数算出部と、前記重複ターム数算出部の算出した シソーラスタームの数を出力機器に出力する出力部とを備えたことを特徴とする。  [0009] The document search device of the present invention is connected to a document database that stores a set of document information including a thesaurus raster, and includes a search sentence input unit that inputs a search sentence as input device power, and the search sentence input unit. A document information acquisition unit that acquires document information corresponding to the input search sentence from the document database, and a thesaurus raster extraction that extracts a thesaurus raster contained in the acquired document information acquired by the document information acquisition unit and stores it in a storage device And the number of duplicate terms that calculates the number of thesaurus the same as the thesaurus raster extracted in the past by the thesaurus raster extraction unit in the thesaurus raster extracted this time by the thesaurus raster extraction unit And an output unit that outputs the number of thesaurus rasters calculated by the duplicate term number calculation unit to an output device. And
[0010] 本発明の文献検索装置は、シソーラスタームを含む文献情報の集合を記憶する文 献データベースとシソーラスタームの階層情報を記憶する階層データベースとに接 続し、検索文を入力機器から入力する検索文入力部と、前記検索文入力部の入力し た検索文に対応する文献情報を前記文献データベースから取得する文献情報取得 部と、前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを 抽出するシソーラスターム抽出部と、前記シソーラスターム抽出部の抽出したシソー ラスタームについての階層情報を前記階層データベース力 取得するシソーラス階 層取得部と、シソーラスタームの階層の指定を入力する指定階層入力部と、前記シソ 一ラスターム抽出部の抽出したシソーラスタームのうち前記指定階層入力部の入力し た階層に対応するシソーラスタームを前記シソーラス階層取得部の取得した階層情 報に基づ 、て選択するシソーラスターム選択部と、前記シソーラスターム選択部の選 択したシソーラスタームを出力機器に出力する出力部とを備えたことを特徴とする。  [0010] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information including a thesaurus raster and a hierarchy database that stores the hierarchy information of the thesaurus raster, and inputs a retrieval sentence from an input device. A search sentence input unit, a document information acquisition unit that acquires document information corresponding to the search sentence input by the search statement input unit from the document database, and a thesaurus included in the acquired document information acquired by the document information acquisition unit A thesaurus raster extracting unit for extracting terms, a thesaurus hierarchy acquiring unit for acquiring hierarchical information about the thesaurus raster extracted by the thesaurus raster extracting unit, and a specified hierarchy input for inputting the specification of the hierarchy of the thesaurus raster And the specified hierarchy input of the thesaurus raster extracted by the slicer raster extraction unit. A thesaurus raster selection unit for selecting a thesaurus raster corresponding to the hierarchy input by the unit based on the hierarchy information acquired by the thesaurus hierarchy acquisition unit, and the thesaurus raster selected by the thesaurus raster selection unit as an output device And an output unit that outputs to the output.
[0011] 本発明の文献検索装置は、シソーラスタームを含む文献情報の集合を記憶する文 献データベースに接続し、シソーラスタームを含む検索文について、検索文の含む シソーラスタームを検索条件とする狭義検索方式と検索文の含むシソーラスタームと 検索文の含むシソーラスタームの下位層のシソーラスタームとを検索条件とする広義 検索方式との!/、ずれかの検索方式の指定を入力する検索方式入力部と、シソーラス タームを含む検索文を入力機器から入力する検索文入力部と、前記検索方式入力 部の入力した検索方式での検索条件に従い、前記検索文入力部の入力した検索文 に対応する文献情報を前記文献データベースから取得する文献情報取得部と、前 記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出する シソーラスターム抽出部と、前記シソーラスターム抽出部の抽出したシソーラスターム を出力機器に出力する出力部とを備えたことを特徴とする。 [0011] The document search apparatus of the present invention is connected to a document database that stores a set of document information including a thesaurus raster, and for a search sentence including a thesaurus raster, a narrow-sense search using the thesaurus raster included in the search sentence as a search condition A search method input unit for inputting designation of a search method! Or a broad search method that uses a thesaurus raster including the search method and the thesaurus raster including the search statement and the thesaurus raster below the thesaurus raster included in the search statement as search conditions; ,thesaurus A search text input unit that inputs a search text including a term from an input device, and according to a search condition in the search method input by the search method input unit, the document information corresponding to the search text input by the search text input unit is A document information acquisition unit acquired from a document database, a thesaurus raster extraction unit that extracts a thesaurus raster included in the acquired document information acquired by the document information acquisition unit, and the thesaurus raster extracted by the thesaurus raster extraction unit And an output unit that outputs to the output.
[0012] 本発明の文献検索装置は、シソーラスタームを含む文献情報の集合を記憶する文 献データベースとシソーラスタームの同義語を記憶する同義語データベースとに接 続し、複数の検索語を入力機器から入力する検索語入力部と、前記検索語入力部 の入力した複数の検索語それぞれを同義語とするそれぞれのシソーラスタームを前 記同義語データベースから一括取得して表示装置に表示する同義ターム取得部と、 前記同義ターム取得部が一括取得して表示したシソーラスタームから、検索語とする シソーラスタームを指定させる検索語指定部と、前記検索語指定部により指定された シソーラスタームを含む検索文を入力機器力 入力する検索文入力部と、前記検索 文入力部の入力した検索文に対応する文献情報を前記文献データベースから取得 する文献情報取得部と、前記文献情報取得部の取得した取得文献情報に含まれる シソーラスタームを抽出するシソーラスターム抽出部と、前記シソーラスターム抽出部 の抽出したシソーラスタームを出力機器に出力する出力部とを備えたことを特徴とす る。  [0012] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information including a thesaurus raster and a synonym database that stores synonyms of the thesaurus raster, and inputs a plurality of search words. And a synonym term for acquiring each of the thesaurus rasters having synonyms for the plurality of search terms input by the search term input unit and displaying them on the display device collectively. A search term designating unit for designating a thesaurus raster as a search term from the thesaurus rasters acquired and displayed by the synonym term obtaining unit, and a search sentence including the thesaurus raster designated by the search term designating unit. Input device power A search sentence input part to be input and reference information corresponding to the search sentence input by the search sentence input part are stored in the reference database. The document information acquisition unit to be acquired, the thesaurus raster extraction unit for extracting the thesaurus raster contained in the acquired document information acquired by the document information acquisition unit, and the thesaurus raster extracted by the thesaurus raster extraction unit to the output device And an output section.
[0013] 本発明の文献検索装置の文献検索方法は、シソーラスタームを含む文献情報の集 合を記憶する文献データベースと各シソーラスタームを遺伝子と疾患と化合物と組織 との分類に対応付けて記憶する分類データベースとに接続し、文献を検索する文献 検索装置の文献検索方法であり、検索文入力部が検索文を入力機器から入力する 検索文入力処理を行い、文献情報取得部が、前記検索文入力部の入力した検索文 に対応する文献情報を前記文献データベース力 取得する文献情報取得処理を行 い、シソーラスターム抽出部が、前記文献情報取得部の取得した取得文献情報に含 まれるシソーラスタームを抽出するシソーラスターム抽出処理を行 、、分類取得部が 、前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれについて、各シ ソーラスタームの分類を前記分類データベースから取得する分類取得処理を行い、 出力部が前記シソーラスターム抽出部の抽出したシソーラスタームと前記分類取得 部の取得した当該分類とを対応付けて出力機器に出力する出力処理を行うことを特 徴とする。 The document search method of the document search apparatus of the present invention stores a document database that stores a collection of document information including a thesaurus raster, and stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues. A document search method of a document search apparatus that connects to a classification database and searches for a document, wherein a search sentence input unit inputs a search sentence from an input device, and a reference information acquisition unit performs the search sentence input process. A document information acquisition process is performed to acquire the document information corresponding to the search text input by the input unit, and the thesaurus raster extraction unit includes the thesaurus raster included in the acquired document information acquired by the document information acquisition unit. A thesaurus raster extraction process is performed, and the classification acquisition unit applies to each of the thesaurus rasters extracted by the thesaurus raster extraction unit. , Each sheet A classification acquisition process for acquiring the classification of the sorrustum from the classification database is performed, and the output unit associates the thesaurus raster extracted by the thesaurus raster extraction unit with the classification acquired by the classification acquisition unit, and outputs it to the output device It is characterized by output processing.
[0014] 本発明の文献検索装置の文献検索方法は、シソーラスタームを含む文献情報の集 合を記憶する文献データベースと、遺伝子と疾患とィ匕合物と組織との分類でありシソ 一ラスタームの階層構造に基づく分類に対応付けて各シソーラスタームを記憶する 分類データベースとに接続し、文献を検索する文献検索装置の文献検索方法であり [0014] The document search method of the document search apparatus of the present invention includes a document database that stores a collection of document information including a thesaurus raster, a classification of genes, diseases, compounds, and tissues. A document retrieval method for a document retrieval apparatus that retrieves documents by connecting to a classification database that stores each thesaurus raster in association with classification based on a hierarchical structure.
、検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、文献情 報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を前記文献 データベース力 取得する文献情報取得処理を行 、、シソーラスターム抽出部が、 前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出す るシソーラスターム抽出処理を行い、分類取得部が、前記シソーラスターム抽出部の 抽出したシソーラスタームそれぞれにつ 、て、各シソーラスタームの分類を前記分類 データベースから取得する分類取得処理を行 、、出力部が前記シソーラスターム抽 出部の抽出したシソーラスタームと前記分類取得部の取得した当該分類とを対応付 けて出力機器に出力する出力処理を行うことを特徴とする。 The search sentence input unit performs a search sentence input process for inputting the search sentence from the input device, and the document information acquisition unit acquires the document information corresponding to the search sentence input by the search sentence input unit. The document information acquisition process is performed, and the thesaurus raster extraction unit performs a thesaurus raster extraction process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquisition unit, and the classification acquisition unit performs the thesaurus raster extraction. For each of the extracted thesaurus rasters, a classification acquisition process for acquiring the classification of each thesaurus raster from the classification database is performed, and the output section extracts the thesaurus raster extracted by the thesaurus raster extraction section and the classification acquisition section. The output processing is performed by associating the acquired classification and outputting to the output device.
[0015] 本発明の文献検索装置の文献検索方法は、シソーラスタームを含む文献情報の集 合を記憶する文献データベースに接続し、文献を検索する文献検索装置の文献検 索方法であり、検索文入力部が検索文を入力機器から入力する検索文入力処理を 行い、文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情 報を前記文献データベースから取得する文献情報取得処理を行!、、シソーラスター ム抽出部が、前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタ ームを抽出するシソーラスターム抽出処理を行い、タームスコア算出部力 前記シソ 一ラスターム抽出部の抽出したシソーラスタームそれぞれにつ 、て、文献情報の集 合の文献情報の数における当該シソーラスタームが含まれる文献情報の数の割合に 対する、取得文献情報の数における当該シソーラスタームが含まれる文献情報の数 の割合のスコアを算出するタームスコア算出処理を行い、出力部が前記シソーラスタ ーム抽出部の抽出したシソーラスタームと前記タームスコア算出部の算出した当該ス コアとを対応付けて出力機器に出力する出力処理を行うことを特徴とする。 [0015] The document search method of the document search device of the present invention is a document search method of a document search device that searches a document by connecting to a document database that stores a collection of document information including a thesaurus raster. The input unit performs a search statement input process for inputting a search statement from an input device, and the document information acquisition unit acquires the document information corresponding to the search statement input by the search statement input unit from the document database. The thesaurus term extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit, and the term score calculating unit For each of the thesaurus rasters extracted by the extraction unit, the ratio of the number of document information that includes the thesaurus raster to the number of document information in the collection of document information. In against performs term score calculation process for calculating the score of the ratio of the number of document information includes the thesaurus term in the number of acquired document information, the output unit is the thesaurus data An output process is performed in which the thesaurus raster extracted by the term extraction unit and the score calculated by the term score calculation unit are associated with each other and output to an output device.
[0016] また、取得文献情報の数についての第 1の閾値と当該シソーラスタームが含まれる 文献情報の数についての第 2の閾値とを記憶する記憶機器をさらに備える前記文献 検索装置の文献検索方法であり、前記タームスコア算出処理において、前記ターム スコア算出部が、前記記憶機器力 第 1の閾値と第 2の閾値とを取得し、第 1の閾値と 第 2の閾値と取得文献情報の数と当該シソーラスタームが含まれる取得文献情報の 数とに基づいて、取得文献情報の数が第 1の閾値未満であり且つ当該シソーラスタ ームが含まれる取得文献情報の数が第 2の閾値未満である当該シソーラスタームの スコアを低く算出することを特徴とする文献検索方法。  [0016] The document retrieval method of the document retrieval device further includes a storage device that stores a first threshold value regarding the number of acquired document information and a second threshold value regarding the number of document information including the thesaurus raster. In the term score calculation process, the term score calculation unit acquires the first threshold value and the second threshold value of the storage device power, and the first threshold value, the second threshold value, and the number of acquired document information And the number of acquired document information that includes the thesaurus raster, and the number of acquired document information is less than the first threshold and the number of acquired document information that includes the thesaurus raster is less than the second threshold. A literature search method characterized by calculating a low score of the thesaurus raster.
[0017] 本発明の文献検索装置の文献検索方法は、シソーラスタームを含む文献情報の集 合を記憶する文献データベースに接続し、文献を検索する文献検索装置の文献検 索方法であり、検索文入力部が検索文を入力機器から入力する検索文入力処理を 行い、文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情 報を前記文献データベースから取得する文献情報取得処理を行!、、シソーラスター ム抽出部が、前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタ ームを抽出し記憶機器に記憶するシソーラスターム抽出処理を行い、重複ターム数 算出部が前記シソーラスターム抽出部が今回抽出したシソーラスタームの内で前記 シソーラスターム抽出部が記憶機器に記憶している過去に抽出したシソーラスターム と同じシソーラスタームの数を算出する重複ターム数算出処理を行い、出力部が、前 記重複ターム数算出部の算出したシソーラスタームの数を出力機器に出力する出力 処理を行うことを特徴とする。  [0017] The document search method of the document search device of the present invention is a document search method of a document search device for searching a document by connecting to a document database that stores a collection of document information including a thesaurus raster. The input unit performs a search statement input process for inputting a search statement from an input device, and the document information acquisition unit acquires the document information corresponding to the search statement input by the search statement input unit from the document database. The thesaurus term extraction unit extracts the thesaurus raster contained in the acquired document information acquired by the document information acquisition unit and stores it in the storage device, and calculates the number of duplicate terms. Of the thesaurus raster extracted by the thesaurus raster extraction unit this time, the thesaurus raster extracted by the thesaurus raster extraction unit stored in the storage device in the past The number of thesaurus terms that are the same as the number of thesaurus rasters is calculated, and the output unit performs the output process that outputs the number of thesaurus rasters calculated by the number of thesaurus terms to the output device. And
[0018] 本発明の文献検索装置の文献検索方法は、シソーラスタームを含む文献情報の集 合を記憶する文献データベースとシソーラスタームの階層情報を記憶する階層デー タベースとに接続し、文献を検索する文献検索装置の文献検索方法であり、検索文 入力部が検索文を入力機器から入力する検索文入力処理を行!、、文献情報取得部 力 前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ ースから取得する文献情報取得処理を行い、シソーラスターム抽出部が、前記文献 情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出するシソー ラスターム抽出処理を行い、シソーラス階層取得部が前記シソーラスターム抽出部の 抽出したシソーラスタームについての階層情報を前記階層データベースから取得す るシソーラス階層取得処理を行い、指定階層入力部がシソーラスタームの階層の指 定を入力する指定階層入力処理を行 、、シソーラスターム選択部が前記シソーラス ターム抽出部の抽出したシソーラスタームのうち前記指定階層入力部の入力した階 層に対応するシソーラスタームを前記シソーラス階層取得部の取得した階層情報に 基づ 、て選択するシソーラスターム選択処理を行!、、出力部が前記シソーラスターム 選択部の選択したシソーラスタームを出力機器に出力する出力処理を行うことを特徴 とする。 [0018] The document search method of the document search apparatus of the present invention is connected to a document database that stores a collection of document information including a thesaurus raster and a hierarchical database that stores hierarchical information of the thesaurus raster, and searches the document. A document search method for a document search apparatus, wherein a search sentence input unit performs a search sentence input process for inputting a search sentence from an input device !, and corresponds to a search sentence input by the reference information acquisition unit. A document information acquisition process for acquiring document information from the document database is performed, and the thesaurus raster extraction unit A thesaurus raster extraction process is performed to extract a thesaurus raster contained in the acquired document information acquired by the information acquisition unit, and the thesaurus hierarchy acquisition unit acquires the hierarchy information about the thesaurus raster extracted by the thesaurus raster extraction unit from the hierarchy database. The thesaurus hierarchy acquisition processing is performed, the designated hierarchy input section performs the designation hierarchy input process for inputting the designation of the thesaurus raster hierarchy, and the thesaurus raster selection section performs the designation of the thesaurus raster extracted by the thesaurus term extraction section. A thesaurus raster selection process for selecting a thesaurus raster corresponding to the hierarchy input by the hierarchy input section based on the hierarchy information acquired by the thesaurus hierarchy acquisition section is performed, and the output section selects the thesaurus raster selection section. Output processing to output the thesaurus raster to the output device And wherein the Ukoto.
[0019] 本発明の文献検索装置の文献検索方法は、シソーラスタームを含む文献情報の集 合を記憶する文献データベースに接続し、文献を検索する文献検索装置の文献検 索方法であり、検索方式入力部が、シソーラスタームを含む検索文ついて、検索文の 含むシソーラスタームを検索条件とする狭義検索方式と検索文の含むシソーラスター ムと検索文の含むシソーラスタームの下位層のシソーラスタームとを検索条件とする 広義検索方式との!/、ずれかの検索方式の指定を入力する検索方式入力処理を行 、 、検索文入力部がシソーラスタームを含む検索文を入力機器力 入力する検索文入 力処理を行い、文献情報取得部が、前記検索方式入力部の入力した検索方式での 検索条件に従い、前記検索文入力部の入力した検索文に対応する文献情報を前記 文献データベース力 取得する文献情報取得処理を行いシソーラスターム抽出部が 、前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出 するシソーラスターム抽出処理を行い、出力部が前記シソーラスターム抽出部の抽出 したシソーラスタームを出力機器に出力する出力処理を行うことを特徴とする。  The document search method of the document search device of the present invention is a document search method of a document search device that connects to a document database that stores a collection of document information including a thesaurus raster and searches the document. For the search sentence including the thesaurus raster, the input unit searches for the narrow-sense search method using the thesaurus raster included in the search sentence as the search condition, the thesaurus raster included in the search sentence, and the thesaurus raster below the thesaurus raster included in the search sentence. Search condition input process to input the specification of! / With the broad search method as a condition, or search method input processing to enter the search term input, and the search statement input part input the search statement including the thesaurus raster The document information acquisition unit performs processing, and the document information corresponding to the search sentence input by the search sentence input unit according to the search condition in the search method input by the search method input unit. The thesaurus database extraction processing is performed to acquire the literature information, the thesaurus raster extraction unit performs the thesaurus raster extraction processing to extract the thesaurus raster included in the acquired document information acquired by the literature information acquisition unit, and the output unit is the thesaurus It is characterized by output processing that outputs the thesaurus raster extracted by the term extraction unit to an output device.
[0020] 本発明の文献検索装置の文献検索方法は、シソーラスタームを含む文献情報の集 合を記憶する文献データベースとシソーラスタームの同義語を記憶する同義語デー タベースとに接続し、文献を検索する文献検索装置の文献検索方法であり、検索語 入力部が複数の検索語を入力機器から入力する検索語入力処理を行 、、同義ター ム取得部が前記検索語入力部の入力した複数の検索語それぞれを同義語とするそ れぞれのシソーラスタームを前記同義語データベースから取得する同義ターム取得 処理を行 、、指定部が前記同義ターム取得部の取得したシソーラスターム力 検索 語とするシソーラスタームを指定させる指定処理を行!、、検索文入力部が前記指定 部により指定されたシソーラスタームを含む検索文を入力機器力 入力する検索文 入力処理を行い、 [0020] The document search method of the document search device of the present invention is connected to a document database that stores a collection of document information including a thesaurus raster and a synonym database that stores synonyms of the thesaurus raster, and searches the document. The search term input unit performs a search term input process for inputting a plurality of search terms from an input device, and the synonym term acquisition unit performs a plurality of search terms input by the search term input unit. Each search term is a synonym. Performs synonym term acquisition processing for acquiring each thesaurus raster from the synonym database, and performs specification processing for the designation unit to specify the thesaurus raster as the thesaurus raster force search word acquired by the synonym term acquisition unit! The search text input unit performs a search text input process for inputting a search text including the thesaurus raster specified by the specification unit.
文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行!、、シソーラスターム抽 出部が、前記文献情報取得部の取得した取得文献情報に含まれるシソーラスターム を抽出するシソーラスターム抽出処理を行い、出力部が前記シソーラスターム抽出部 の抽出したシソーラスタームを出力機器に出力する出力処理を行うことを特徴とする  The document information acquisition unit performs document information acquisition processing for acquiring the document information corresponding to the search sentence input by the search sentence input unit from the document database !, and the thesaurus raster extracting unit includes the document information acquisition unit. A thesaurus raster extraction process for extracting a thesaurus raster contained in the acquired document information acquired is performed, and an output unit performs an output process for outputting the thesaurus raster extracted by the thesaurus raster extraction unit to an output device.
[0021] 本発明の文献検索プログラムは、上記いずれかの文献検索方法をコンピュータに 実行させることを特徴とする。 [0021] A document search program of the present invention causes a computer to execute any one of the document search methods described above.
[0022] 本発明の文献検索装置は、文献情報の集合を記憶する文献データベースと各文 献用語を特定の分類と対応付けて記憶する分類データベースとに接続し、検索文を 入力機器から入力する検索文入力部と、前記検索文入力部の入力した検索文に対 応する文献情報を前記文献データベースから取得する文献情報取得部と、前記文 献情報取得部の取得した取得文献情報に含まれる文献用語を抽出する文献用語抽 出部と、前記文献用語抽出部の抽出した文献用語それぞれについて、各文献用語 の分類を前記分類データベースから取得する分類取得部と、前記文献用語抽出部 の抽出した文献用語と前記分類取得部の取得した当該分類とを対応付けて出力機 器に出力する出力部とを備えたことを特徴とする。  [0022] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information and a classification database that stores each document term in association with a specific classification, and inputs a retrieval sentence from an input device. Included in the search text input section, the literature information acquisition section for acquiring the literature information corresponding to the search text input by the search text input section from the literature database, and the acquired literature information acquired by the literature information acquisition section A document term extraction unit that extracts a document term, a document acquisition unit that acquires a classification of each document term from the classification database, and a document term extraction unit that each document term extracted by the document term extraction unit is extracted. And an output unit that associates the literature term with the classification acquired by the classification acquisition unit and outputs the associated term to the output device.
[0023] 本発明の文献検索装置は、文献情報の集合を記憶する文献データベースに接続 し、検索文を入力機器から入力する検索文入力部と、前記検索文入力部の入力した 検索文に対応する文献情報を前記文献データベースから取得する文献情報取得部 と、前記文献情報取得部の取得した取得文献情報に含まれる文献用語を抽出する 文献用語抽出部と、前記文献用語抽出部の抽出した文献用語それぞれについて、 文献情報の集合の文献情報の数における当該文献用語が含まれる文献情報の数の 割合に対する、取得文献情報の数における当該文献用語が含まれる取得文献情報 の数の割合のスコアを算出するタームスコア算出部と、前記文献用語抽出部の抽出 した文献用語と前記タームスコア算出部の算出した当該スコアとを対応付けて出力 機器に出力する出力部とを備えたことを特徴とする。 [0023] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information, and corresponds to a retrieval sentence input unit that inputs a retrieval sentence from an input device, and a retrieval sentence that is input by the retrieval sentence input unit. A document information acquisition unit that acquires document information to be acquired from the document database, a document term extraction unit that extracts document terms included in the acquired document information acquired by the document information acquisition unit, and a document that is extracted by the document term extraction unit For each term, the number of document information containing the document term in the number of document information in the set of document information A term score calculation unit that calculates a score of the ratio of the number of acquired document information including the document term in the number of acquired document information with respect to the ratio, and the document term extracted by the document term extraction unit and the term score calculation unit And an output unit that associates the calculated score with the score and outputs the score to an output device.
[0024] さらに、取得文献情報の数についての第 1の閾値と当該文献用語が含まれる取得 文献情報の数についての第 2の閾値とを記憶する記憶機器を備え、前記タームスコ ァ算出部は、前記記憶機器力 第 1の閾値と第 2の閾値とを取得し、第 1の閾値と第 2の閾値と取得文献情報の数と当該文献用語が含まれる取得文献情報の数とに基 づいて、取得文献情報の数が第 1の閾値未満であり且つ当該文献用語が含まれる 取得文献情報の数が第 2の閾値未満である当該文献用語のスコアを低く算出するこ とを特徴とする。  [0024] Furthermore, a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of acquired document information including the document term, the term score calculator The storage device power is acquired based on the first threshold value and the second threshold value, and based on the first threshold value, the second threshold value, the number of acquired document information, and the number of acquired document information including the document term. The score of the document term in which the number of acquired document information is less than the first threshold and the number of acquired document information is less than the second threshold is calculated to be low.
[0025] 本発明の文献検索装置は、文献情報の集合を記憶する文献データベースに接続 し、検索文を入力機器から入力する検索文入力部と、前記検索文入力部の入力した 検索文に対応する文献情報を前記文献データベースから取得する文献情報取得部 と、前記文献情報取得部の取得した取得文献情報に含まれる文献用語を抽出し記 憶機器に記憶する文献用語抽出部と、前記文献用語抽出部が今回抽出したの文献 用語の内で前記文献用語抽出部が記憶機器に記憶している過去に抽出した文献用 語と同じ文献用語の数を算出する重複用語数算出部と、前記重複用語数算出部の 算出した文献用語の数を出力機器に出力する出力部とを備えたことを特徴とする。  [0025] The document retrieval apparatus of the present invention is connected to a document database that stores a set of document information, and corresponds to a retrieval sentence input unit that inputs a retrieval sentence from an input device, and a retrieval sentence that is input by the retrieval sentence input unit. A document information acquisition unit that acquires document information to be acquired from the document database; a document term extraction unit that extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores the document term in a storage device; and the document term Among the terminology extracted by the extraction unit this time, the duplicate term number calculation unit that calculates the number of the same document terms as the literature term extracted in the past stored in the storage device by the literature term extraction unit, and the duplication The term number calculation unit includes an output unit that outputs the calculated number of reference terms to an output device.
[0026] 本発明の文献検索装置の文献検索方法は、文献情報の集合を記憶する文献デー タベースと各文献用語を特定の分類と対応付けて記憶する分類データベースとに接 続し、文献を検索する文献検索装置の文献検索方法であり、検索文入力部が検索 文を入力機器から入力する検索文入力処理を行い、文献情報取得部が、前記検索 文入力部の入力した検索文に対応する文献情報を前記文献データベースから取得 する文献情報取得処理を行い、文献用語抽出部が、前記文献情報取得部の取得し た取得文献情報に含まれる文献用語を抽出する文献用語抽出処理を行い、分類取 得部が前記文献用語抽出部の抽出した文献用語それぞれについて、各文献用語の 分類を前記分類データベースから取得する分類取得処理を行 ヽ、出力部が前記文 献用語抽出部の抽出した文献用語と前記分類取得部の取得した当該分類とを対応 付けて出力機器に出力する出力処理を行うことを特徴とする。 [0026] The document search method of the document search apparatus of the present invention connects a document database that stores a set of document information and a classification database that stores each document term in association with a specific classification, and searches the document. The document search method of the document search apparatus performs search sentence input processing in which the search sentence input unit inputs the search sentence from the input device, and the document information acquisition unit corresponds to the search sentence input by the search sentence input unit. A document information acquisition process for acquiring document information from the document database is performed, and a document term extraction unit performs a document term extraction process for extracting a document term included in the acquired document information acquired by the document information acquisition unit, and classified. For each document term extracted by the document term extraction unit, the acquisition unit performs classification acquisition processing for acquiring the classification of each document term from the classification database, and the output unit performs the sentence acquisition. An output process is performed in which the literature terms extracted by the dedicated term extraction unit are associated with the classification acquired by the classification acquisition unit and output to an output device.
[0027] 本発明の文献検索装置の文献検索方法は、文献情報の集合を記憶する文献デー タベースに接続し、文献を検索する文献検索装置の文献検索方法であり、検索文入 力部が検索文を入力機器から入力する検索文入力処理を行 、、文献情報取得部が 、前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得処理を行い、文献用語抽出部が、前記文献情報取得 部の取得した取得文献情報に含まれる文献用語を抽出する文献用語抽出処理を行 い、タームスコア算出部が、前記文献用語抽出部の抽出した文献用語それぞれにつ いて、文献情報の集合の文献情報の数における当該文献用語が含まれる文献情報 の数の割合に対する、取得文献情報の数における当該文献用語が含まれる取得文 献情報の数の割合のスコアを算出するタームスコア算出処理を行い、出力部が前記 文献用語抽出部の抽出した文献用語と前記タームスコア算出部の算出した当該スコ ァとを対応付けて出力機器に出力する出力処理を行うことを特徴とする。  [0027] The document search method of the document search device of the present invention is a document search method of a document search device that searches for documents by connecting to a document database that stores a set of document information, and the search sentence input unit searches the document search method. A document information acquisition process in which a document is acquired from an input device, and a document information acquisition unit acquires document information corresponding to the search statement input by the search statement input unit from the document database. The document term extraction unit performs a document term extraction process for extracting the document term included in the acquired document information acquired by the document information acquisition unit, and the term score calculation unit extracts the document term extraction unit. For each document term, the acquired document information including the document term in the number of acquired document information to the ratio of the number of document information including the document term in the number of document information in the set of document information A term score calculation process is performed to calculate a score of the ratio of the number of documents, and the output unit associates the document term extracted by the document term extraction unit with the score calculated by the term score calculation unit and outputs it to the output device The output processing is performed.
[0028] 取得文献情報の数についての第 1の閾値と当該文献用語が含まれる文献情報の 数についての第 2の閾値とを記憶する記憶機器をさらに備える前記文献検索装置の 文献検索方法であり、前記タームスコア算出処理において、前記タームスコア算出部 力 前記記憶機器力 第 1の閾値と第 2の閾値とを取得し、第 1の閾値と第 2の閾値と 取得文献情報の数と当該文献用語が含まれる取得文献情報の数とに基づいて、取 得文献情報の数が第 1の閾値未満であり且つ当該文献用語が含まれる取得文献情 報の数が第 2の閾値未満である文献用語のスコアを低く算出することを特徴とする。  [0028] The document search method of the document search apparatus further includes a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of document information including the document term. In the term score calculation process, the term score calculation unit force, the storage device force, the first threshold value and the second threshold value are acquired, the first threshold value, the second threshold value, the number of acquired document information, and the document Based on the number of acquired document information that includes the term, the number of acquired document information that is less than the first threshold and the number of acquired document information that includes the document term is less than the second threshold The term score is calculated low.
[0029] 本発明の文献検索装置の文献検索方法は、文献情報の集合を記憶する文献デー タベースに接続し、文献を検索する文献検索装置の文献検索方法であり、検索文入 力部が検索文を入力機器から入力する検索文入力処理を行 、、文献情報取得部が 、前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得処理を行い、文献用語抽出部が、前記文献情報取得 部の取得した取得文献情報に含まれる文献用語を抽出し記憶機器に記憶する文献 用語抽出処理を行い、重複用語数算出部が前記文献用語抽出部が今回抽出した 文献用語の内で前記文献用語抽出部が記憶機器に記憶している過去に抽出した文 献用語と同じ文献用語の数を算出する重複用語数算出処理を行い、出力部が、前 記重複用語数算出部の算出した文献用語の数を出力機器に出力する出力処理を 行うことを特徴とする。 [0029] The document search method of the document search device of the present invention is a document search method of a document search device that searches a document by connecting to a document database that stores a set of document information, and the search sentence input unit searches the document search method. A document information acquisition process in which a document is acquired from an input device, and a document information acquisition unit acquires document information corresponding to the search statement input by the search statement input unit from the document database. The document term extraction unit extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores it in a storage device, and the duplicate term number calculation unit extracts the document term Extracted this time Among the document terms, the document term extraction unit calculates the number of document terms that are the same as the document terms extracted in the past and stored in the storage device. An output process is performed in which the number of document terms calculated by the number calculation unit is output to an output device.
[0030] 本発明の文献検索プログラムは、上記いずれかの文献検索方法をコンピュータに 実行させることを特徴とする。  [0030] A document search program of the present invention causes a computer to execute any one of the document search methods described above.
[0031] 本発明の文献検索装置は、シソーラスタームを含む文献情報の集合を記憶する文 献データベースと各シソーラスタームを遺伝子と疾患と化合物と組織との分類に対応 付けて記憶する分類データベースとに接続し、検索文を入力機器力 入力する検索 文入力部と、前記検索文入力部の入力した検索文に対応する文献情報を前記文献 データベースから取得する文献情報取得部と、前記文献情報取得部の取得した取 得文献情報に含まれるシソーラスタームを抽出するシソーラスターム抽出部と、前記 シソーラスターム抽出部の抽出したシソーラスタームそれぞれにつ 、て、各シソーラ スタームの分類を前記分類データベースから取得する分類取得部と、前記シソーラ スターム抽出部の抽出したシソーラスタームそれぞれについて、文献情報の集合の 文献情報の数における当該シソーラスタームが含まれる文献情報の数の割合に対す る、取得文献情報の数における当該シソーラスタームが含まれる取得文献情報の数 の割合のスコアを算出するタームスコア算出部と、前記シソーラスターム抽出部の抽 出したシソーラスタームと共に、前記分類取得部の取得した当該分類と前記タームス コア算出部の算出した当該スコアとを、分子、疾患などの関連に関わる情報として出 力機器に出力する出力部とを備えたことを特徴とする。  [0031] The document retrieval apparatus of the present invention includes a document database that stores a set of document information including a thesaurus raster, and a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues. A search sentence input unit for connecting and inputting a search sentence as input device power, a literature information acquisition part for obtaining literature information corresponding to the search sentence input by the search sentence input unit from the literature database, and the literature information acquisition unit For each of the thesaurus rasters extracted by the thesaurus raster extraction unit for extracting the thesaurus rasters included in the acquired document information obtained from the classification database. For each of the acquisition unit and the thesaurus raster extracted by the thesaurus solar extraction unit, Term score that calculates the score of the ratio of the number of acquired document information that includes the thesaurus raster in the number of acquired document information to the ratio of the number of document information that includes the thesaurus raster in the number of document information in the set of Together with the calculation unit and the thesaurus raster extracted by the thesaurus raster extraction unit, the classification acquired by the classification acquisition unit and the score calculated by the term core calculation unit are used as information related to the relationship between molecules, diseases, and the like. And an output unit for outputting to an output device.
[0032] 本発明の文献検索方法は、シソーラスタームを含む文献情報の集合を記憶する文 献データベースと各シソーラスタームを遺伝子と疾患と化合物と組織との分類に対応 付けて記憶する分類データベースとに接続し、文献を検索する文献検索装置の文献 検索方法であり、検索文入力部が検索文を入力機器から入力する検索文入力処理 を行い、文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献 情報を前記文献データベースから取得する文献情報取得処理を行 、、シソーラスタ ーム抽出部が、前記文献情報取得部の取得した取得文献情報に含まれるシソーラス タームを抽出するシソーラスターム抽出処理を行い、分類取得部が、前記シソーラス ターム抽出部の抽出したシソーラスタームそれぞれにつ!/、て、各シソーラスタームの 分類を前記分類データベース力も取得する分類取得処理を行い、タームスコア算出 部力 前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれについて、 文献情報の集合の文献情報の数における当該シソーラスタームが含まれる文献情報 の数の割合に対する、取得文献情報の数における当該シソーラスタームが含まれる 取得文献情報の数の割合のスコアを算出するタームスコア算出処理を行い、出力部 力 前記シソーラスターム抽出部の抽出したシソーラスタームと共に、前記分類取得 部の取得した当該分類と前記タームスコア算出部の算出した当該スコアとを、分子、 疾患などの関連に関わる情報として出力機器に出力する出力処理を行うことを特徴 とする。 [0032] The document retrieval method of the present invention includes a document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues. A document search method of a document search apparatus for connecting and searching for documents, wherein a search sentence input unit performs a search sentence input process for inputting a search sentence from an input device, and a document information acquisition unit is input to the search sentence input unit A document information acquisition process is performed to acquire document information corresponding to the retrieved text from the document database, and the thesaurus raster extracting unit includes the thesaurus included in the acquired document information acquired by the document information acquiring unit. Performs a thesaurus raster extraction process for extracting terms, and the classification acquisition unit performs a classification acquisition process for acquiring the classification of each thesaurus raster as well as the classification database power for each of the thesaurus rasters extracted by the thesaurus term extraction unit! For each of the thesaurus rasters extracted by the thesaurus raster extraction unit, in terms of the number of acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information A term score calculation process is performed to calculate a score of the ratio of the number of acquired document information including the thesaurus raster, and the output unit force together with the thesaurus raster extracted by the thesaurus raster extraction unit and the classification acquired by the classification acquisition unit The score calculated by the term score calculator Preparative, molecules, and performing an output process of outputting to the output device as the information related to the related, such as a disease.
[0033] 本発明の文献検索プログラムは、上記文献検索方法をコンピュータに実行させるこ とを特徴とする。  [0033] A document search program according to the present invention causes a computer to execute the document search method.
発明の効果  The invention's effect
[0034] 本発明によれば、ヒットした文献情報に含まれるシソーラスターム (例えば、 MeSH ターム)と共に、分類 (遺伝子、疾患、化合物、組織)、タームスコアを、医学文献情報 を絞り込むための情報として出力できることにより、有用な文献情報を取得するため の絞り込み文献検索を支援することができる。  [0034] According to the present invention, the classification (gene, disease, compound, tissue), and term score as well as the thesaurus raster (for example, MeSH term) included in the hit literature information are used as information for narrowing down medical literature information. By being able to output, it is possible to support a narrowed-down document search for obtaining useful document information.
[0035] また、本発明によれば、ヒットした文献情報に含まれるシソーラスタームと共に、分類 、タームスコアを、分子、疾患などの関連に関わる情報として出力できることにより、医 学'分子生物学的発見を促すことができる。  [0035] Further, according to the present invention, the classification and term score can be output as information related to the relationship between molecules, diseases, etc., together with the thesaurus raster included in the hit literature information, so that medical 'molecular biological discovery Can be encouraged.
[0036] また、今回抽出したシソーラスタームの内で過去に抽出したシソーラスタームと同じ シソーラスタームの数を出力できることにより、検索の推移がわ力るため、有用な文献 情報を取得するための絞り込み文献検索を支援することができる。 [0036] Further, the number of thesaurus rasters extracted in the past in the thesaurus raster extracted this time can be output, so that the transition of the search is effective. Therefore, the narrowed-down documents for obtaining useful literature information Search can be supported.
[0037] また、ヒットした文献情報に含まれるシソーラスタームのうち指定された階層のシソー ラスタームを医学文献情報を絞り込むための情報として出力できることにより、有用な 文献情報を取得するための絞り込み文献検索を支援することができる。 [0037] Further, a thesaurus raster of a specified hierarchy among the thesaurus rasters included in the hit literature information can be output as information for narrowing down medical literature information, so that a narrowed-down literature search for obtaining useful literature information can be performed. Can help.
[0038] また、狭義検索と広義検索とのうち指定された検索方式で文献情報を検索し、ヒット した文献情報に含まれるシソーラスタームを医学文献情報を絞り込むための情報とし て出力できることにより、有用な文献情報を取得するための絞り込み文献検索を支援 することができる。 [0038] Further, the literature information is searched by a specified search method between narrow search and broad search, and hits are found. By outputting the thesaurus raster contained in the document information as information for narrowing down medical document information, it is possible to support a narrowed-down document search for obtaining useful document information.
[0039] また、入力した検索語を同義語とするシソーラスタームを取得し、さらに、取得した シソーラスタームのうち指定されたシソーラスタームで一括して対応する文献情報を 検索できることにより、絞り込み文献検索を支援することができる。  [0039] Further, a thesaurus raster having the input search word as a synonym is acquired, and further, the corresponding literature information can be searched at once in the specified thesaurus raster among the acquired thesaurus rasters, so that the narrowed-down document search can be performed. Can help.
[0040] また、本発明によれば、ヒットした文献情報に含まれる文献用語と共に、分類、ター ムスコアを、文献情報を絞り込むための情報として出力できることにより、医学関連文 献に限らず有用な文献情報を取得するための絞り込み文献検索を支援することがで きる。  [0040] Further, according to the present invention, it is possible to output the classification and term score as information for narrowing down the literature information together with the literature terms included in the hit literature information. It is possible to support a narrowed-down document search for obtaining information.
[0041] また、本発明によれば、ヒットした文献情報に含まれる文献用語と共に、分類、ター ムスコアを、分子、疾患などの関連に関わる情報として出力できることにより、医学-分 子生物学的発見を促すことができる。  [0041] Further, according to the present invention, the classification and term score can be output as information related to the relationship between molecules, diseases, etc., together with the literature terms included in the hit literature information, so that the medical-molecular biological discovery is possible. Can be encouraged.
[0042] また、今回抽出した文献用語の内で過去に抽出した文献用語と同じ文献用語の数 を出力できることにより、検索の推移がわ力るため、医学関連文献に限らず有用な文 献情報を取得するための絞り込み文献検索を支援することができる。  [0042] Further, since the number of the same document terms extracted in the past among the document terms extracted this time can be output, the transition of the search is effective. Therefore, useful document information is not limited to medical related documents. It is possible to support a narrowed-down document search for acquiring.
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0043] 実施の形態 1. [0043] Embodiment 1.
文献情報を絞り込むための情報として、ヒットした文献情報に含まれる文献用語と共 に、分類、タームスコアを出力する形態について以下に説明する。また、以下の説明 において、「文献情報」を「MEDLINEに収録された医学文献情報」、「シソーラスタ ーム」を「MeSHターム」とする例につ!/、て示す。  As information for narrowing down the literature information, the form of outputting the classification and term score together with the literature terms included in the hit literature information will be described below. In addition, in the following explanation, “Document information” is shown as “Medical literature information recorded on MEDLINE”, and “Thesaurus raster” is shown as “MeSH term”.
[0044] 図 1は、実施の形態 1における文献検索装置 100の構成図である。 FIG. 1 is a configuration diagram of the document search device 100 according to the first embodiment.
実施の形態 1における文献検索装置 100の構成にっ 、て、図 1に基づ 、て以下に 説明する。  The configuration of the document retrieval apparatus 100 according to the first embodiment will be described below with reference to FIG.
[0045] 文献検索装置 100は以下のものを備える。  The document retrieval apparatus 100 includes the following.
ユーザ環境設定部 101は、文献検索装置 100の各種設定情報 (以下、環境設定 情報とする)をユーザ毎に入力機器力 入力し、入力した環境設定情報を当該ユー ザの情報としてユーザ情報 DB (データベース) 191に記憶する。 The user environment setting unit 101 inputs various setting information (hereinafter referred to as environment setting information) of the document search apparatus 100 for each user as input device power, and the input environment setting information is input to the user. This information is stored in the user information DB (database) 191.
ユーザ情報 DB191は、文献検索装置 100を使用するユーザ毎に、文献検索装置 100へのログイン ID (識別子)、パスワード、環境設定情報など、文献検索装置 100 を利用する際のユーザに関する情報 (以下、ユーザ情報とする)を記憶機器に記憶し 、ユーザ情報に対するアクセス管理をする。  For each user who uses the document search device 100, the user information DB 191 is information related to a user when using the document search device 100, such as a login ID (identifier), password, and environment setting information for the document search device 100 (hereinafter, User information) is stored in a storage device, and access management for user information is performed.
MEDLINE更新情報取得部 102は、 NLMの FTP (File Transfer Protocol) サイトに定期的にアクセスし、 MEDLINE200に新たに収録された (又は、収録され る予定の)文献情報についての情報 (以下、更新情報とする)を取得し、取得した更 新情報に基づいて、以下で説明するターム DB (データベース) 192の更新や MED LINE200の文献情報の更新についてのユーザへの通知など、更新情報に関する 処理を行う。  The MEDLINE update information acquisition unit 102 periodically accesses NLM's FTP (File Transfer Protocol) site and provides information about the document information newly recorded (or scheduled to be recorded) in MEDLINE 200 (hereinafter referred to as update information). ) And based on the acquired update information, update the term DB (database) 192, which will be explained below, and notify users about the MED LINE200 bibliographic information update. .
ターム DB192は、 MeSHタームについて、 Heading (以下、代表タームとする)、 H eadingのシノニム(以下、同義ターム [同義語]とする)、 MeSHの TreeNumber (以 下、 MeSH番号とする)、当該 MeSHタームの説明、医学情報として属する分類 (遺 伝子、疾患、化合物、組織、その他)(以下、分類とする)、 MEDLINEに収録された 全文献のうちで当該 MeSHタームが関連する文献の数 (以下、全文献中ターム毎文 献数とする)などの MeSHタームそれぞれに関する情報(以下、 MeSHターム情報と する)を記憶機器に記憶し、 MeSHターム情報に対するアクセス管理をする。  The term DB192 is for MeSH terms: Heading (hereinafter referred to as “representative term”), Synonym of Heading (hereinafter referred to as “synonymous term”), MeSH TreeNumber (hereinafter referred to as “MeSH number”), MeSH term Explanation of terms, classifications belonging to medical information (genes, diseases, compounds, tissues, etc.) (hereinafter referred to as classifications), and the number of documents related to the MeSH term among all documents recorded in MEDLINE ( Information related to each MeSH term (hereinafter referred to as MeSH term information), such as the number of documents per term in all documents, is stored in the storage device, and access management for MeSH term information is performed.
検索結果記憶部 193は、 MEDLINE200の検索結果情報(以下、 MEDLINE結 果情報とする)である、クエリー (検索文)に対応する文献情報 (以下、ヒット文献情報 とする)、 MEDLINE200に収録された全文献数およびヒット文献数を記憶すると共 に、文献検索装置 100の検索結果情報 (以下、検索結果情報とする)である、ヒット文 献情報力も抽出した MeSHターム(SCRを含むものとし、以下、抽出タームとする)、 抽出タームについての代表ターム以外の情報(以下、付加情報とする)および検索の 推移を示す情報 (以下、検索推移情報とする)を記憶する記憶機器である。  The search result storage unit 193 includes MEDLINE200 search result information (hereinafter referred to as MEDLINE result information), document information corresponding to a query (search sentence) (hereinafter referred to as hit document information), and MEDLINE200. In addition to storing the total number of documents and the number of hit documents, MeSH terms (including SCR, which is also the search result information of the document search device 100 (hereinafter referred to as search result information), which also extracts the hit document information power, are described below. This is a storage device that stores information other than the representative term (hereinafter referred to as additional information) and information indicating the transition of search (hereinafter referred to as search transition information).
表示変更部 103は、検索結果の表示内容についての情報 (以下、表示指定情報と する)を入力機器から入力し、表示指定情報に基づく表示情報 (以下、表示変更情 報とする)を出力して検索結果の表示を変更する。 ログイン部 104は、ユーザの指定したログイン IDとパスワードとを入力機器力も入力 し (以下、ユーザの指定したログイン IDとパスワードとをログイン情報とする)、入力し たログイン情報とユーザ情報 DB191に記憶されたユーザ情報とに基づ 、て、ユーザ の文献検索装置 100へのアクセスを制御する。 The display change unit 103 inputs information about the display contents of the search result (hereinafter referred to as display designation information) from the input device, and outputs display information based on the display designation information (hereinafter referred to as display change information). To change the display of search results. The login unit 104 also inputs the login ID and password specified by the user as input devices (hereinafter, the login ID and password specified by the user are referred to as login information), and stores the input login information and user information in the DB191. Based on the received user information, the user's access to the document search device 100 is controlled.
[0046] 文献検索部 110は、ユーザの指定に基づいてクエリーを生成し、生成したクエリー で MEDLINE200にアクセスし、 MEDLINE200から MEDLINE結果情報を取得 する。 [0046] The document search unit 110 generates a query based on user designation, accesses the MEDLINE 200 using the generated query, and acquires MEDLINE result information from the MEDLINE 200.
[0047] 検索結果生成部 120は、 MEDLINE結果情報に基づいて検索結果情報を生成し [0047] The search result generation unit 120 generates search result information based on the MEDLINE result information.
、生成した検索結果情報を検索結果記憶部 193に記憶する。 The generated search result information is stored in the search result storage unit 193.
[0048] 検索結果表示部 105は、 MEDLINE結果情報と検索結果情報に基づいて検索結 果を表示装置に表示し、表示変更情報に基づいて検索結果の表示を変更する。 [0048] The search result display unit 105 displays the search result on the display device based on the MEDLINE result information and the search result information, and changes the display of the search result based on the display change information.
[0049] 図 2は、実施の形態 1における文献検索部 110の構成図である。 FIG. 2 is a configuration diagram of the document search unit 110 according to the first embodiment.
実施の形態 1における文献検索部 110の構成にっ 、て、図 2に基づ 、て以下に説 明する。  The configuration of the document search unit 110 in the first embodiment will be described below based on FIG.
[0050] 文献検索部 110は以下のものを備える。  [0050] The document search unit 110 includes the following.
シノニム検索部 111は、 MEDLINE200の文献情報を検索するためにユーザが始 めに指定したターム (以下、検索タームとする)を入力機器力も入力し、入力した検索 タームを代表タームまたは同義タームとする MeSHタームの代表タームをターム DB 192力ら取得し、取得した代表タームのうちで MEDLINE200にアクセスするクエリ 一に使用することをユーザに指定されたターム (以下、指定タームとする)を入力機器 から入力する。  The synonym search unit 111 also inputs the input device power to the term specified by the user first to search the literature information of MEDLINE200 (hereinafter referred to as the search term), and the input search term is set as the representative term or synonymous term. The representative term of the MeSH term is obtained from the term DB 192, and the term specified by the user (hereinafter referred to as the designated term) to be used for the query to access MEDLINE200 among the obtained representative terms is input from the input device. input.
検索ターム入力部 112は検索タームを入力機器力 入力する。  The search term input unit 112 inputs a search term as input device power.
代表ターム取得部 113は検索タームを代表タームまたは同義タームとする MeSH タームの代表タームをターム DB192から取得する。  The representative term acquisition unit 113 acquires the representative term of the MeSH term whose search term is the representative term or synonymous term from the term DB 192.
指定ターム取得部 114は指定タームを入力機器力 入力する。  The specified term acquisition unit 114 inputs the specified term as input device power.
クエリー生成部 115は指定タームや環境設定情報に基づ!/、てクエリーを生成する。 MEDLINEアクセス部 116は、クエリーを入力機器力 入力し、入力したクエリーで The query generation unit 115 generates a query based on the specified term and environment setting information. The MEDLINE access unit 116 inputs a query as input device power and uses the entered query.
MEDLINE200にアクセスし、 MEDLINE200から MEDLINE結果情報を取得し、 取得した MEDLINE結果情報を検索結果記憶部 193に記憶する。 Access MEDLINE200, get MEDLINE result information from MEDLINE200, The acquired MEDLINE result information is stored in the search result storage unit 193.
[0051] 図 3は、実施の形態 1における検索結果生成部 120の構成図である。 FIG. 3 is a configuration diagram of the search result generation unit 120 in the first embodiment.
実施の形態 1における検索結果生成部 120の構成について、図 3に基づ 、て以下 に説明する。  The configuration of search result generation unit 120 in the first embodiment will be described below based on FIG.
[0052] 検索結果生成部 120は以下のものを備える。  [0052] The search result generation unit 120 includes the following.
MeSHターム抽出部 121はヒット文献情報から MeSHタームを抽出する(MeSHタ ームの抽出は、抽出した MeSHタームに対応する代表タームの取得も意味するもの とする)。  The MeSH term extraction unit 121 extracts MeSH terms from hit document information (MeSH term extraction also means acquisition of representative terms corresponding to the extracted MeSH terms).
付加情報生成部 122は MEDLINE結果情報に基づいて付加情報を生成する。 分類取得部 123は抽出タームの分類をターム DB192から取得する。  The additional information generation unit 122 generates additional information based on the MEDLINE result information. The classification acquisition unit 123 acquires the extraction term classification from the term DB 192.
タームスコア算出部 124はヒット文献に対する抽出タームの重要度を示すスコア(タ 一ムスコア)を算出する。  The term score calculation unit 124 calculates a score (term score) indicating the importance of the extracted term for the hit document.
MeSH番号取得部 125は抽出タームの MeSH番号をターム DB192から取得する 検索推移情報生成部 126は今回の MEDLINE結果情報と抽出ターム情報と付カロ 情報および過去の MEDLINE結果情報と抽出ターム情報と付加情報とに基づいて 検索推移情報を生成する。  The MeSH number acquisition unit 125 acquires the MeSH number of the extracted term from the term DB 192. The search transition information generation unit 126 is the current MEDLINE result information, extracted term information, attached calo information, past MEDLINE result information, extracted term information, and additional information. Based on the above, search transition information is generated.
[0053] 図 4は、実施の形態 1におけるターム DB192の記憶管理する MeSHターム情報の テーブルを示す図である。 FIG. 4 is a table showing MeSH term information stored and managed in the term DB 192 according to the first embodiment.
実施の形態 1におけるターム DB192の記憶管理するターム情報のテーブルにつ いて、図 4に基づいて以下に説明する。  The term information table stored and managed by the term DB 192 in the first embodiment will be described below with reference to FIG.
ターム DB192は、「代表ターム(Heading)」、「MeSH番号」、「タームの説明」、「 分類」、「全文献中ターム毎文献数」および「同義ターム」を MeSHターム毎に有する テーブル(以下、 MeSHタームテーブルとする)を MeSHターム情報としてあらかじめ 記憶し、 MeSHターム情報のアクセス管理をする。図 4において、「エントリ ID」は、 M eSHターム毎に MeSHターム情報を識別する情報である。また、「分類」は、遺伝子( Gene)と疾患(Disease)と化合物(Chemical)と組織 (Tissue)との医学情報の分類 の内、「代表ターム(Heading)」の示す MeSHタームの属する分類を示す。遺伝子 は「G」、疾患は「D」、化合物は「C」、組織は「T」として示して 、る。 Term DB192 is a table (hereinafter referred to as “Heading”, “MeSH number”, “Term description”, “Classification”, “Number of documents per term in all documents”, and “Synonym terms” for each MeSH term) , MeSH term table) is stored in advance as MeSH term information, and access management of MeSH term information is performed. In FIG. 4, “Entry ID” is information for identifying MeSH term information for each MeSH term. “Classification” is a classification of medical information of genes (genes), diseases (disseases), compounds (chemicals), and tissues (tissues), and the classifications to which the MeSH terms indicated by the “representative terms (heading)” belong. Show. gene Is indicated as “G”, disease as “D”, compound as “C”, and tissue as “T”.
[0054] 図 4において、「Neoplasms」を代表タームとする MeSHタームは、 MeSH番号「C 04」で識別され、「疾患 (D)」に分類され、 MEDLINEに収録された全文献のうちで 関連する文献の数 (全文献中ターム毎文献数)が「 135284」であり、 rcancersjや「 Benign」などの同義タームを持つことを示している。また、「Neoplasms」を代表ター ムとする MeSHタームの情報はエントリ ID「9028」で識別されるエントリに設定されて 、ることを示して 、る。 [0054] In Fig. 4, the MeSH term, whose representative term is "Neoplasms", is identified by the MeSH number "C 04", is classified as "Disease (D)", and is related to all the documents included in MEDLINE. The number of documents (number of documents per term in all documents) is “135284”, indicating that it has synonymous terms such as rcancersj and “Benign”. It also indicates that MeSH term information with “Neoplasms” as the representative term is set in the entry identified by the entry ID “9028”.
[0055] 次に、ターム DB192の MeSHターム情報の設定方法について説明する。 [0055] Next, a method for setting MeSH term information in the term DB 192 will be described.
「エントリ ID」には、記憶管理する MeSHターム情報の MeSHターム数分、ユニーク な IDを用意して設定する。  In “Entry ID”, set unique IDs for the number of MeSH terms in the MeSH term information to be stored and managed.
「Heading」、「MeSH番号」、「タームの説明」および「同義ターム」には MeSHデ ータベースに記憶された情報を設定する。  The information stored in the MeSH database is set in “Heading”, “MeSH number”, “Term description”, and “Synonym term”.
「全文献中ターム毎文献数」には、当該エントリにおいて「Heading」に設定された MeSHタームおよび「同義ターム」に設定された MeSHタームの全てを OR (論理和) 条件としたクエリーで MEDLINE200にアクセスした際のヒット文献数を設定する。  “Number of documents per term in all documents” is set to MEDLINE200 in a query using the OR (logical sum) condition for all MeSH terms set to “Heading” and MeSH terms set to “Synonymous terms” in the entry. Set the number of hit documents when accessed.
[0056] 次に、「分類」の設定方法について説明する。 Next, a method for setting “classification” will be described.
図 5、図 6、図 7、図 8、図 9は、 MeSH Descritorのツリー図である。  Figures 5, 6, 7, 8, and 9 are MeSH Descritor tree diagrams.
図 5〜図 9【こお!ヽて、 MeSH番号 [D05]、 [D06]、 [D08]〜[D12]、 [D14. 600 ]、 [D17. 350]、 [D24]および [D27. 505]とそれぞれの下位層の Descriptorタ ームのうちで最下層の Descriptorタームを、遺伝子に分類する MeSHターム(以下 、遺伝子分類タームとする)としてターム DB192に「G」を設定する。また、遺伝子に 分類した Descriptorタームにマップされている SCRタームを遺伝子分類タームとし てターム DB192に「G」を設定する。  Fig. 5 to Fig. 9 [Koo! MeSH numbers [D05], [D06], [D08] to [D12], [D14. 600], [D17. 350], [D24] and [D27. 505] ] And “G” is set in the term DB192 as the MeSH term (hereinafter referred to as gene classification term) for classifying the lowermost Descriptor term among the lower level Descriptor terms into genes. In addition, “G” is set in the term DB192 with the SCR term mapped to the Descriptor term classified as a gene as the gene classification term.
また、 MeSH番号 [C01]〜 [C04]および [C05]〜 [C23]とそれぞれの下位層の Descriptorタームを、疾患に分類する MeSHターム(以下、疾患分類タームとする) としてターム DB192に「D」を設定する。  In addition, the MeSH numbers [C01] to [C04] and [C05] to [C23] and the lower-layer Descriptor terms are classified as “MeSH terms” (hereinafter referred to as “disease classification terms”). "Is set.
また、 MeSH番号 [D01ト [D04]、 [D07]、 [D14. 100]、 [D15]、 [D16]、 [D 17. 025]、 [D17. 050]、 [D18]、 [D19]、 [D26]、 [D27. 720]および [D27. 8 88]とそれぞれの下位層の Descriptorタームのうちで RegistryNumberが収録され ている Descriptorタームを、化合物に分類する MeSHターム(以下、化合物分類タ ームとする)としてターム DB192に「C」を設定する。また、化合物に分類した Descrip torタームにマップされて!/、て RegistryNumberが収録されて!、る SCRタームを化合 物分類タームとしてターム DB192に「C」を設定する。 In addition, MeSH numbers [D01 to [D04], [D07], [D14. 100], [D15], [D16], [D 17. 025], [D17. 050], [D18], [D19], [D26], [D27. 720] and [D27. 8 88] and the Descriptor term in which RegistryNumber is recorded among the Descriptor terms in each lower layer, “C” is set in the term DB192 as the MeSH term (hereinafter referred to as the compound classification term) that classifies the compound. . Also, it is mapped to the Descrip tor term classified as a compound! /, And RegistryNumber is recorded !, and “C” is set in the term DB192 with the SCR term as the compound classification term.
また、 MeSH番号 [A01ト [A08]、 [A10]、 [Al l. 063ト [Al l. 270]、 [Al Also, the MeSH numbers [A01 to [A08], [A10], [Al l. 063 to [Al l. 270], [Al
I. 284. 149]、 [Al l. 284. 180]、 [Al l. 284. 295]、 [Al l. 284. 420]、 [AI. 284. 149], [Al l. 284. 180], [Al l. 284. 295], [Al l. 284. 420], [A
I I . 284. 430. 214]、 [Al l. 284. 430. 429]、 [Al l. 299ト [Al l. 868]、 [ A12ト [A14]、 [C04. 182]、 [C04. 557. 435ト [C04. 557. 665]および [C 04. 588]とそれぞれの下位層の Descriptorタームを、組織に分類する MeSHター ム(以下、組織分類タームとする)としてターム DB192に「T」を設定する。 II. 284. 430. 214], [Al l. 284. 430. 429], [Al l. 299 to [Al l. 868], [A12 to [A14], [C04. 182], [C04. 557 435 G [C04. 557. 665] and [C 04. 588] and their lower-layer Descriptor terms are classified into organizations as MeSH terms (hereinafter referred to as organization classification terms) with term “T” in term DB192. Set.
[0057] 以下、遺伝子分類タームにおける MeSH番号 [D05]、 [D06]、 [D08]〜[D12]、  [0057] Hereinafter, MeSH numbers [D05], [D06], [D08] to [D12] in gene classification terms,
[D14. 600]、 [D17. 350]、 [D24]および [D27. 505]、疾唐、分類タームにおける MeSH番号 [C01]〜 [C04]および [C05]〜 [C23]、化合物分類タームにおける M eSH番号 [D01ト [D04]、 [D07]、 [D14. 100]、 [D15]、 [D16]、 [D17. 025] 、 [D17. 050]、 [D18]、 [D19]、 [D26]、 [D27. 720]および [D27. 888]、組織 分類タームにおける MeSH番号 [A01]〜[A08]、 [A10]、 [Al l. 063]〜[A11. 270]、 [Al l. 284. 149]、 [Al l. 284. 180]、 [Al l. 284. 295]、 [Al l. 284 . 420]、 [Al l. 284. 430. 214]、 [Al l. 284. 430. 429]、 [Al l. 299ト [Al 1. 868]、 [A12ト [A14]、 [C04. 182]、 [C04. 557. 435ト [C04. 557. 665 ]および [C04. 588]をそれぞれ、 MeSHタームを分類するための指標にする分類 指標番号 (特定の MeSH番号)とする。  [D14. 600], [D17. 350], [D24] and [D27. 505], Shipang, MeSH numbers in classification terms [C01]-[C04] and [C05]-[C23], in compound classification terms M eSH number [D01 G [D04], [D07], [D14. 100], [D15], [D16], [D17. 025], [D17. 050], [D18], [D19], [D26 ], [D27. 720] and [D27. 888], MeSH numbers [A01] to [A08], [A10], [Al l. 063] to [A11.270], [Al l. 149], [Al l. 284. 180], [Al l. 284. 295], [Al l. 284. 420], [Al l. 284. 430. 214], [Al l. 284. 430. 429], [Al l. 299 to [Al 1. 868], [A12 to [A14], [C04. 182], [C04. 557. 435 to [C04. 557. 665] and [C04. 588] Each is a classification index number (specific MeSH number) that is used as an index for classifying MeSH terms.
[0058] 図 10は、実施の形態 1における検索結果表示部 105が検索結果を表示した画面( 検索結果画面)を示す図である。  FIG. 10 is a diagram showing a screen (search result screen) on which the search result display unit 105 in Embodiment 1 displays the search result.
図 10において、(1)はメインメニュー、(2)はクエリー、(3)はサブメニュー、(4)は 結果、(5)は履歴である。  In FIG. 10, (1) is the main menu, (2) is the query, (3) is the submenu, (4) is the result, and (5) is the history.
[0059] メインメニュー(1)は、メイン機能を切り替える為のボタンで、メイン機能として、文献 検索機能(Search)、シノニム検索機能 (Dictionary)、ァラート機能 (Alert)、環境 設定機能(Customize)、ヘルプ機能 (Help)、ログアウト機能 (Logout)とを示して いる。 [0059] The main menu (1) is a button for switching the main function. The main function includes the document search function (Search), synonym search function (Dictionary), alert function (Alert), environment The setting function (Customize), help function (Help), and logout function (Logout) are shown.
文献検索装置 100は、ユーザがマウス 903 (入力機器の一例)を使用してメインメ- ユー(1)のボタンを押下した際、押下されたボタンが文献検索機能であれば文献検 索部 110を実行して文献検索を行う。また、文献検索装置 100は、押下されたボタン がシノニム検索機能であればシノニム検索部 111を実行してシノニム検索を行 、、押 下されたボタンがァラート機能または環境設定機能であればユーザ環境設定部 101 を実行して環境設定を行う。また、文献検索装置 100は、押下されたボタンがヘルプ 機能であれば文献検索装置 100の説明情報を CRT表示装置 901 (表示装置の一 例)に表示する。文献検索装置 100の説明情報は文献検索装置 100の備える記憶 機器に記憶されているものとする。また、文献検索装置 100は、押下されたボタンが ログアウト機能であれば以後の文献検索処理を実行しない。  When the user presses the button of the main menu (1) using the mouse 903 (an example of an input device) when the user presses the button of the main search (1), the document search unit 100 uses the document search unit 110. Execute the literature search. In addition, if the pressed button is a synonym search function, the document search apparatus 100 executes the synonym search unit 111 to perform a synonym search, and if the pressed button is an alert function or an environment setting function, the user environment Execute the setting unit 101 to set the environment. Further, if the pressed button is the help function, the document search device 100 displays the description information of the document search device 100 on the CRT display device 901 (an example of a display device). It is assumed that the description information of the document search device 100 is stored in a storage device included in the document search device 100. Further, if the pressed button is a logout function, the document search device 100 does not execute subsequent document search processing.
[0060] クエリー(2)は、 MEDLINEにアクセスし文献検索するための検索条件式 (クエリー 、検索タームまたは指定ターム)を入力するテキストボックスである。文献検索装置 10 0は、ユーザがキーボード 902 (入力機器の一例)からクエリー(2)に入力して指定し た検索条件式を記憶機器に記憶する。  The query (2) is a text box for inputting a search condition expression (query, search term or designated term) for accessing MEDLINE and searching for a document. The document search apparatus 100 stores in the storage device the search condition expression that the user inputs and specified in the query (2) from the keyboard 902 (an example of an input device).
[0061] サブメニュー(3)は、文献検索結果の表示方法を切り替える為のタブで、文献検索 結果の表示方法として、抽出タームの分類表示 (GDCT)、ヒット文献表示 (Article) 、抽出タームの MeSH番号表示(MeSH)などを示して!/、る。  [0061] The submenu (3) is a tab for switching the display method of the document search results. As the display method of the document search results, the extraction term classification display (GDCT), the hit document display (Article), and the extraction term display Show MeSH number display (MeSH)!
文献検索装置 100は、ユーザがマウス 903 (入力機器の一例)を使用してサブメ- ユー(3)のタブを押下した際に表示変更部 103を実行して、押下されたタブの情報( 表示指定情報)に応じた文献検索結果の表示変更を行う。  The document retrieval apparatus 100 executes the display change unit 103 when the user presses the tab of the sub-user (3) using the mouse 903 (an example of an input device), and displays information (display) of the pressed tab. The display of the document search result is changed according to the designation information.
[0062] 結果 (4)は、サブメニュー(3)で指定された表示方法による文献検索結果を示す表 示領域である。  [0062] The result (4) is a display area showing a document search result by the display method specified in the submenu (3).
[0063] 履歴(5)は、検索推移を示す表示領域である。  [0063] The history (5) is a display area indicating search transition.
[0064] ここで、文献検索装置 100は文献を検索する文献検索装置の一例である。  Here, the document retrieval apparatus 100 is an example of a document retrieval apparatus that retrieves documents.
また、 MEDLINE200は文献情報の集合(例えば、 MEDLINE200に収録された 全文献情報)を記憶する文献データベースの一例である。 また、ターム DB192は、各シソーラスタームを遺伝子と疾患と化合物と組織との分 類と対応付けて記憶する分類データベース、シソーラスタームの階層情報 (MeSH 番号)を記憶する階層データベース及びシソーラスタームの同義語を記憶する同義 語データベースの一例である。 また、 MEDLINEアクセス部 116は、検索文を入力機器から入力する検索文入力 部及び入力した検索文に対応する文献情報を文献データベース力 取得する文献 情報取得部の一例である。 MEDLINE 200 is an example of a document database that stores a set of document information (for example, all document information recorded in MEDLINE 200). Term DB 192 is a classification database that stores each thesaurus raster in association with the classification of genes, diseases, compounds, and tissues, a hierarchical database that stores hierarchical information (MeSH number) of thesaurus rasters, and synonyms for thesaurus rasters. It is an example of a synonym database that stores. The MEDLINE access unit 116 is an example of a search statement input unit that inputs a search statement from an input device and a document information acquisition unit that acquires document information corresponding to the input search statement.
また、 MeSHターム抽出部 121は、取得文献情報 (ヒット文献情報)に含まれる文献 用語 (例えば、 MeSHターム [抽出タームに対応する代表タームも含む])を抽出する シソーラスターム抽出部および文献用語抽出部の一例である。  In addition, the MeSH term extraction unit 121 extracts a document term (for example, MeSH term [including a representative term corresponding to the extraction term) included in the acquired document information (hit document information) and a document term extraction. It is an example of a part.
また、分類取得部 123は、シソーラスタームの分類を分類データベース(ターム DB 192)から取得する分類取得部の一例である。  The classification acquisition unit 123 is an example of a classification acquisition unit that acquires the thesaurus raster classification from the classification database (term DB 192).
また、検索結果表示部 105は、抽出ターム (抽出タームに対応する代表タームおよ び MeSH番号も含む)と分類とタームスコア、検索推移を出力機器に出力する出力 部の一例である。  The search result display unit 105 is an example of an output unit that outputs an extraction term (including a representative term and MeSH number corresponding to the extraction term), a classification, a term score, and a search transition to an output device.
また、タームスコア算出部 124は、文献情報の集合の文献情報の数における当該 シソーラスタームが含まれる文献情報の数の割合に対する、取得文献情報の数にお ける当該シソーラスタームが含まれる取得文献情報の数の割合のスコアを算出するタ 一ムスコア算出部の一例である。  Further, the term score calculation unit 124 obtains the acquired document information including the thesaurus raster in the number of acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information. 5 is an example of a time score calculation unit that calculates a score of the ratio of the number of
また、検索推移情報生成部 126は、今回抽出した文献用語 (例えば、 MeSHター ム)の内、過去の文献用語と同じ文献用語の数を算出する重複ターム数算出部およ び重複用語数算出部の一例である。  In addition, the search transition information generation unit 126 calculates the number of duplicate terms and the number of duplicate terms that calculate the number of the same literature terms as the past literature terms among the literature terms (for example, MeSH terms) extracted this time. It is an example of a part.
また、表示変更部 103は、シソーラスタームの階層の指定を入力する指定階層入 力部、シソーラスタームの階層情報を階層データベース力 取得するシソーラス階層 取得部及び指定階層入力部の入力した階層に対応するシソーラスタームをシソーラ ス階層取得部の取得した階層情報に基づいて選択するシソーラスターム選択部の一 例である。  The display changing unit 103 also corresponds to the hierarchy input by the specified hierarchy input unit for inputting the specification of the thesaurus raster hierarchy, the thesaurus hierarchy acquisition unit for acquiring the hierarchy information of the thesaurus raster, and the specified hierarchy input unit. 5 is an example of a thesaurus raster selection unit that selects a thesaurus raster based on hierarchical information acquired by a thesaurus hierarchy acquisition unit.
また、文献検索部 110は、検索文の含むシソーラスタームを検索条件とする狭義検 索方式と検索文の含むシソーラスタームと検索文の含むシソーラスタームの下位層 のシソーラスタームとを検索条件とする広義検索方式とのいずれかの検索方式の指 定を入力する検索方式入力部の一例である。ここで、ユーザ情報に含まれる環境設 定情報には狭義検索と広義検索とのいずれか検索方式を選択する設定情報 (以下In addition, the document search unit 110 performs a narrow sense search using a thesaurus raster included in the search sentence as a search condition. An example of a search method input unit for inputting a search method specification of a search method and a broad search method using a thesaurus raster including the search statement and a thesaurus raster below the thesaurus raster included in the search statement as search conditions It is. Here, the environment setting information included in the user information includes setting information for selecting either a narrow search or a broad search (referred to below).
、検索方式情報とする)が含まれ、文献検索部 110はユーザ情報 DB191から検索方 式情報を入力する。 The document search unit 110 inputs the search method information from the user information DB 191.
また、代表ターム取得部 113は、検索語を同義語とするシソーラスタームを同義語 データベース力も取得して表示装置に表示する同義ターム取得部の一例である。 また、指定ターム取得部 114は、表示したシソーラスターム力も検索語とするシソー ラスタームを指定させる検索語指定部の一例である。  The representative term acquisition unit 113 is an example of a synonym term acquisition unit that acquires a thesaurus raster having a synonym as a search term and also displays the synonym database power on a display device. The designated term acquisition unit 114 is an example of a search term designation unit that designates a thesaurus raster that uses the displayed thesaurus raster power as a search term.
[0065] 図 11は、実施の形態 1における文献検索装置 100の外観を示す図である。 FIG. 11 is a diagram illustrating an appearance of the document search device 100 according to the first embodiment.
図 11において、文献検索装置 100は、システムユニット 910、 CRT (Cathode Ra y Tube)表示装置 901、キーボード (KZB) 902、マウス 903、コンパクトディスク装 置(CDD) 905、プリンタ装置 906、スキャナ装置 907を備え、これらはケーブルで接 続されている。  In FIG. 11, the document retrieval device 100 includes a system unit 910, a CRT (Cathode Ray Tube) display device 901, a keyboard (KZB) 902, a mouse 903, a compact disc device (CDD) 905, a printer device 906, and a scanner device 907. These are connected by cables.
さらに、文献検索装置 100は、 FAX機 932、電話器 931とケーブルで接続され、ま た、ローカルエリアネットワーク(LAN) 942、ウェブサーバ 941を介してインターネット 940に接続されている。そして、文献検索装置 100は LAN942やインターネット 940 を介して MEDLINEデータベースサーバ 200 (以下、 MEDLINE200とする)、 OM IMデータベースサーバ 210 (以下、 OMIM210とする)や MeSHデータベースサー ノ 220 (以下、 MeSH220とする)にアクセスする。  Further, the document retrieval apparatus 100 is connected to a fax machine 932 and a telephone 931 via a cable, and is connected to the Internet 940 via a local area network (LAN) 942 and a web server 941. The document retrieval apparatus 100 is connected to the MEDLINE database server 200 (hereinafter referred to as MEDLINE200), the OM IM database server 210 (hereinafter referred to as OMIM210), the MeSH database server 220 (hereinafter referred to as MeSH220) via the LAN 942 or the Internet 940. ).
[0066] 図 12は、実施の形態 1における文献検索装置 100のハードウェア構成図である。 FIG. 12 is a hardware configuration diagram of the document search device 100 according to the first embodiment.
図 12において、文献検索装置 100は、プログラムを実行する CPU (Central Pro cessing Unit:中央処理装置又は単に処理装置) 911を備えている。 CPU911は、 ノ ス 912を介して ROM913、 RAM914,通信ボード 915、 CRT表示装置 901、 K ZB902、マウス 903、 FDD (Flexible Disk Drive) 904、磁気ディスク装置 920、 CDD905、プリンタ装置 906、スキャナ装置 907と接続されている。 CPU911は、以 下に述べる「〜部」の動作を実行するものである。すなわち、 CPU911は、記憶、入 出力、通信、表示等の動作を実行するものである。 In FIG. 12, the document retrieval apparatus 100 includes a CPU (Central Processing Unit: 911) for executing a program 911. CPU911 via ROM 912 ROM913, RAM914, communication board 915, CRT display device 901, K ZB902, mouse 903, FDD (Flexible Disk Drive) 904, magnetic disk device 920, CDD905, printer device 906, scanner device 907 Connected with. The CPU 911 executes the operations of “to part” described below. In other words, CPU911 It performs operations such as output, communication and display.
RAM914は、揮発性メモリの一例である。 ROM913、 FDD904、 CDD905、磁 気ディスク装置 920、光ディスク装置は、不揮発性メモリの一例である。これらは、記 憶機器、記憶装置あるいは記憶部の一例である。ユーザ情報 DB191、ターム DB19 The RAM 914 is an example of a volatile memory. ROM913, FDD904, CDD905, magnetic disk device 920, and optical disk device are examples of nonvolatile memory. These are examples of storage devices, storage devices, or storage units. User information DB191, term DB19
2、検索結果記憶部 193、 MEDLINE200、 OMIM210、 MeSH220は、それぞれ2, search result storage unit 193, MEDLINE200, OMIM210, MeSH220,
、ユーザ情報、 MeSHターム情報、 MEDLINE結果情報 ·検索結果情報、 MEDLI, User information, MeSH term information, MEDLINE result information · Search result information, MEDLI
NE情報、 OMIM情報 'MeSH情報を記憶機器に記憶する。但し、 MEDLINE200NE information, OMIM information 'MeSH information is stored in the storage device. However, MEDLINE200
、 OMIM210, MeSH220の各データベースがインターネット 940に接続された外部 のデータベースであっても構わな ヽ。 通信ボード 915は、 FAX機 932、電話器 931、 LAN942等に接続されている。 例えば、通信ボード 915、 KZB902、スキャナ装置 907、 FDD904、記憶機器な どは、入力機器、入力装置あるいは入力部の一例である。ユーザ環境設定部 101、 MEDLINE更新情報取得部 102、表示変更部 103、ログイン部 104は、それぞれ、 環境設定情報、更新情報、表示指定情報、ログイン情報を入力機器から入力する。 また、シノニム検索部 111は検索タームおよび指定タームを入力機器力も入力する。 また、例えば、通信ボード 915、 CRT表示装置 901、記憶機器などは、出力機器、 出力装置あるいは出力部の一例である。シノニム検索部 111、検索結果表示部 105 は、それぞれ、代表ターム、検索結果を出力機器に出力する。 The OMIM210 and MeSH220 databases may be external databases connected to the Internet 940. The communication board 915 is connected to a fax machine 932, a telephone 931, a LAN 942, and the like. For example, the communication board 915, KZB902, scanner device 907, FDD904, storage device, and the like are examples of an input device, an input device, or an input unit. The user environment setting unit 101, the MEDLINE update information acquisition unit 102, the display change unit 103, and the login unit 104 respectively input environment setting information, update information, display designation information, and login information from the input device. The synonym search unit 111 also inputs a search term and a designated term as input device power. Further, for example, the communication board 915, the CRT display device 901, the storage device, and the like are examples of an output device, an output device, or an output unit. The synonym search unit 111 and the search result display unit 105 output the representative term and the search result to the output device, respectively.
[0067] ここで、通信ボード 915は、 LAN942に限らず、直接、インターネット 940、或いは I SDN等の WAN (ワイドエリアネットワーク)に接続されていても構わない。直接、イン ターネット 940、或いは ISDN等の WANに接続されている場合、文献検索装置 100 は、インターネット 940、或いは ISDN等の WANに接続され、ウェブサーバ 941は不 用となる。 Here, the communication board 915 is not limited to the LAN 942, and may be directly connected to the Internet 940 or a WAN (Wide Area Network) such as ISDN. When directly connected to a WAN such as the Internet 940 or ISDN, the document retrieval apparatus 100 is connected to a WAN such as the Internet 940 or ISDN, and the web server 941 is unnecessary.
磁気ディスク装置 920には、オペレーティングシステム(OS) 921、ウィンドウシステ ム 922、プログラム群 923、ファイル群 924が記憶されている。プログラム群 923は、 C PU911、 OS921、ウィンドウシステム 922により実行される。  The magnetic disk device 920 stores an operating system (OS) 921, a window system 922, a program group 923, and a file group 924. The program group 923 is executed by the CPU 911, the OS 921, and the window system 922.
[0068] 上記プログラム群 923には、以下に述べる実施の形態の説明において「〜部」、「〜 DB」、「〜記憶部」として説明する機能を実行するプログラムが記憶されている。プロ グラムは、 CPU911により読み出され実行される。 The program group 923 stores programs that execute the functions described as “˜unit”, “˜DB”, and “˜storage unit” in the following description of the embodiments. Professional The gram is read and executed by the CPU 911.
ファイル群 924には、以下に述べる実施の形態の説明において、「〜を判定し」、「 〜を判定した結果」、「〜を計算し」、「〜を計算した結果」、「〜を処理し」、「〜を処理 した結果」のような表現で説明する結果のデータや「〜情報」のような表現で説明する データが、「〜ファイル」として記憶されている。また、画面上でユーザに指定されたボ タン、タブ、リストボックス、テキストボックスなどの情報が「〜ファイル」として記憶され ている。  In the file group 924, in the description of the embodiment described below, “determined to”, “result of determining to”, “calculated to”, “calculated result of”, and “to” are processed. The data of the result explained in expressions such as “and“ results of processing ”and the data explained in expressions such as“ ˜information ”are stored as“ ˜files ”. In addition, information such as buttons, tabs, list boxes, and text boxes designated by the user on the screen is stored as “˜file”.
また、以下に述べる実施の形態の説明において説明するフローチャートの矢印の 部分は主としてデータの入出力を示し、そのデータの入出力のためにデータは、磁 気ディスク装置 920、 FD (Flexible Disk cartridge)、光ディスク、 CD (コンパクト ディスク)、 MD (ミニディスク)、 DVD (Digital Versatile Disk)、その他の記録媒 体に記録される。あるいは、信号線やその他の伝送媒体により伝送される。  In addition, the arrow portion of the flowchart described in the description of the embodiment described below mainly indicates input / output of data, and for the input / output of the data, the data is the magnetic disk device 920, FD (Flexible Disk Cartridge). , Optical disc, CD (compact disc), MD (mini disc), DVD (Digital Versatile Disc), and other recording media. Alternatively, it is transmitted through a signal line or other transmission medium.
[0069] また、以下に述べる実施の形態の説明において「〜部」、「〜DB」、「〜記憶部」とし て説明するものは、 ROM913に記憶されたファームウエアで実現されて 、ても構わ ない。或いは、ソフトウェアのみ、或いは、ハードウェアのみ、或いは、ソフトウェアとハ 一ドウエアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても 構わない。 In addition, what is described as “˜unit”, “˜DB”, and “˜storage unit” in the following description of the embodiment is realized by firmware stored in the ROM 913. I do not care. Alternatively, it may be implemented by software alone, hardware alone, a combination of software and hardware, or a combination of firmware.
[0070] また、以下に述べる実施の形態を実施するプログラムは、磁気ディスク装置 920、 F D、光ディスク、 CD、 MD、 DVD,その他の記録媒体による記録装置を用いて記憶さ れても構わない。  [0070] A program for carrying out the embodiment described below may be stored by using a recording device using a magnetic disk device 920, FD, an optical disk, a CD, an MD, a DVD, or other recording media.
[0071] 図 13は、実施の形態 1における文献検索装置 100の文献検索処理の流れを示す フローチャートである。  FIG. 13 is a flowchart showing a document search process performed by the document search apparatus 100 according to the first embodiment.
実施の形態 1における文献検索装置 100の文献検索処理の概要について、図 13 に基づいて以下に説明する。  An overview of the document search process of the document search apparatus 100 according to Embodiment 1 will be described below with reference to FIG.
[0072] まず、ログイン部 104力 ユーザの文献検索装置 100へのログイン処理を行う (S10 D o [0072] First, the login unit 104 performs a login process to the user's document search apparatus 100 (S10 Do)
次に、文献検索部 110が、ユーザの指示に応じて、 MEDLINE200にアクセスし MEDLINE検索情報を取得する文献検索処理を行う (S102)。 次に、検索結果生成部 120が、文献検索部 110の取得した MEDLINE検索情報 に基づ!/、て検索結果を生成する検索結果生成処理を行う(S 103)。 Next, the document search unit 110 performs a document search process for accessing the MEDLINE 200 and acquiring MEDLINE search information in accordance with a user instruction (S102). Next, the search result generation unit 120 performs search result generation processing for generating a search result based on the MEDLINE search information acquired by the document search unit 110 (S103).
そして、検索結果表示部 105が、検索結果生成部 120の生成した検索結果情報に 基づいて検索結果を CRT表示装置 901 (表示装置の一例)に表示する検索結果表 示処理を行う(S 104)。  Then, the search result display unit 105 performs a search result display process for displaying the search result on the CRT display device 901 (an example of a display device) based on the search result information generated by the search result generation unit 120 (S 104). .
以後、文献検索部 110は、ユーザの指示に応じて S102〜S104を繰り返し、絞り 込み検索の支援を行う。  Thereafter, the document search unit 110 repeats S102 to S104 according to the user's instruction, and supports the narrow search.
絞り込み検索の終了後、ユーザは検索結果画面のサブメニュー(3)のヒット文献表 示 (Article)を指定することで、絞り込まれた有用な文献情報を取得することができる 。このとき、表示変更部 103が、検索結果記憶部 193から MEDLINE結果情報を取 得し、検索結果表示部 105が MEDLINE結果情報の示すヒット文献情報を、絞り込 まれた有用な文献情報として、表示装置に表示する。  After the refinement search is completed, the user can acquire the refined useful literature information by specifying the hit literature display (Article) in the submenu (3) of the search result screen. At this time, the display change unit 103 acquires the MEDLINE result information from the search result storage unit 193, and the search result display unit 105 displays the hit document information indicated by the MEDLINE result information as the narrowed down useful document information. Display on the device.
[0073] 次に、文献検索装置 100の文献検索処理 (S101〜S104)の詳細について以下に 説明する。 Next, details of the document search processing (S101 to S104) of the document search apparatus 100 will be described below.
[0074] まず、ログイン部 104のログイン処理(S101)について説明する。  First, the login process (S101) of the login unit 104 will be described.
ログイン処理(S101)において、ログイン部 104は、 CRT表示装置 901 (表示装置 の一例)にログイン IDとパスワードとを入力するテキストボックスを表示し、ユーザに口 グィン IDとパスワードとの入力を促す。そして、ログイン部 104は、ユーザがキーボー ド 902 (入力機器の一例)力も入力して指定したログイン IDとパスワードとを入力する 。ログイン部 104は入力したログイン IDを出力してユーザ情報 DB191にアクセスし、 ユーザ情報 DB 191はログイン部 104の出力したログイン IDを検索キーにしてユーザ 情報の検索結果を出力する。ユーザ情報 DB191は、ユーザ情報として、登録ユー ザの登録ログイン IDを記憶すると共に、登録パスワード、当該ユーザの環境設定情 報などをログイン IDに対応付けて記憶機器に記憶しておく。そして、ユーザ情報 DB 191は、登録ログイン IDが検索キーと一致するユーザ情報を検索し、登録ログイン I Dが検索キーと一致するユーザ情報の有無と登録ログイン IDが検索キーと一致する ユーザ情報とを検索結果として出力する。ログイン部 104は、ユーザ情報 DB191の 出力した検索結果が「当該ユーザ情報有り」を示す場合、入力したユーザ指定のパ スワードと検索結果に含まれるユーザ情報内のパスワードとを比較する。そして、入 力したユーザ指定のパスワードと検索結果に含まれるユーザ情報内のパスワードとが 一致する場合に当該ユーザの文献検索装置 100へのアクセスを許可し、以後のュ 一ザの指定に基づく文献検索処理を実行する。また、ログイン部 104は、ユーザ情報 DB191の出力した検索結果が「当該ユーザ情報無し」を示す場合または入力したュ 一ザ指定のパスワードと検索結果に含まれるユーザ情報内のパスワードとがー致しな い場合に当該ユーザの文献検索装置 100へのアクセスを不許可とし以後の文献検 索処理を実行しない。 In the login process (S101), the login unit 104 displays a text box for entering the login ID and password on the CRT display device 901 (an example of a display device), and prompts the user to enter the mouth ID and password. Then, the login unit 104 inputs the login ID and password designated by the user also inputting the keyboard 902 (an example of an input device). The login unit 104 outputs the input login ID and accesses the user information DB 191. The user information DB 191 outputs the search result of the user information using the login ID output from the login unit 104 as a search key. The user information DB 191 stores a registered login ID of a registered user as user information, and stores a registration password, environment setting information of the user, etc. in a storage device in association with the login ID. Then, the user information DB 191 searches for user information whose registered login ID matches the search key, and determines whether there is user information whose registered login ID matches the search key and user information whose registered login ID matches the search key. Output as search results. When the search result output from the user information DB 191 indicates “the user information is present”, the login unit 104 displays the entered user-specified password. The password is compared with the password in the user information included in the search result. Then, if the entered user-specified password matches the password in the user information included in the search results, the user's access to the document search device 100 is permitted, and the documents based on the subsequent user's specifications Perform search processing. In addition, the login unit 104 does not match the search result output from the user information DB 191 indicating that “the user information is not present” or the input password specified by the user and the password in the user information included in the search result. If the user does not permit access to the document search device 100, the subsequent document search process is not executed.
[0075] 次に、文献検索部 110の文献検索処理 (S102)について説明する。 Next, the document search process (S102) of the document search unit 110 will be described.
文献検索処理 (S102)において、ログイン部 104が当該ユーザの文献検索装置 10 0へのアクセスを許可し、アクセスを許可されたユーザが検索結果画面のメインメ-ュ 一(1)の文献検索機能(Search)を指定した場合に、文献検索部 110の MEDLIN Eアクセス部 116は、ユーザが指定したクエリーを入力機器力も入力する。例えば、ュ 一ザがキーボード 902 (入力機器の一例)から検索結果画面のクエリー (2)に入力し たクエリーを入力し、また例えば、ユーザの指定したクエリーが設定されたファイルを 記憶機器 (入力機器の一例)から取得しファイルカもクエリーを入力する。そして、 M EDLINEアクセス部 116は、入力したクエリーで MEDLINE200にアクセスし、 ME DLINE200から MEDLINE結果情報を取得し、取得した MEDLINE結果情報を 検索結果記憶部 193に記憶する。  In the document search process (S102), the login unit 104 permits the user to access the document search device 100, and the user who is permitted to access the document search function (1) in the search result screen (1) When “Search” is specified, the MEDLIN E access unit 116 of the document search unit 110 also inputs a query specified by the user with input device capabilities. For example, a user inputs a query entered in the query (2) on the search result screen from the keyboard 902 (an example of an input device). For example, a file in which a query specified by the user is set is stored in a storage device (input Enter the query from the file obtained from the device example). Then, MEDLINE access unit 116 accesses MEDLINE 200 with the input query, acquires MEDLINE result information from MEDLINE 200, and stores the acquired MEDLINE result information in search result storage unit 193.
[0076] 次に、検索結果生成部 120の検索結果生成処理 (S 103)について説明する。 Next, the search result generation process (S 103) of the search result generation unit 120 will be described.
まず、検索結果生成部 120の MeSHターム抽出部 121が、文献検索部 110の取 得した MEDLINE結果情報に含まれるヒット文献情報力も MeSHタームを抽出する MeSHターム抽出処理を行う。  First, the MeSH term extraction unit 121 of the search result generation unit 120 performs MeSH term extraction processing for extracting the MeSH term from the hit document information included in the MEDLINE result information acquired by the document search unit 110.
そして、検索結果生成部 120の付加情報生成部 122が、 MeSHターム抽出部 121 の抽出した各 MeSHタームについて付加情報を生成する付加情報生成処理を行う。  Then, the additional information generation unit 122 of the search result generation unit 120 performs additional information generation processing for generating additional information for each MeSH term extracted by the MeSH term extraction unit 121.
[0077] 図 14は、実施の形態 1における MeSHターム抽出部 121の MeSHターム抽出処 理の流れを示すフローチャートである。 FIG. 14 is a flowchart showing a flow of MeSH term extraction processing of the MeSH term extraction unit 121 in the first embodiment.
検索結果生成部 120の検索結果生成処理(S 103)における MeSHターム抽出部 121の MeSHターム抽出処理について、図 14に基づいて以下に説明する。 MeSH term extraction unit in search result generation processing (S 103) of search result generation unit 120 The 121 MeSH term extraction process is described below with reference to FIG.
[0078] まず、 MeSHターム抽出部 121は、検索結果記憶部 193からヒット文献情報を取得 する(S201)。 First, the MeSH term extraction unit 121 acquires hit document information from the search result storage unit 193 (S201).
次に、 MeSHターム抽出部 121は、取得した各々のヒット文献情報について S203 〜S205の処理を行うため、取得した全ヒット文献情報を処理済みか判定する(S202 Next, the MeSH term extraction unit 121 determines whether or not the acquired all hit document information has been processed in order to perform the processing of S203 to S205 for each acquired hit document information (S202).
) o ) o
未処理のヒット文献情報がある場合、 MeSHターム抽出部 121は、処理対象のヒッ ト文献情報 (未処理のヒット文献情報の 、ずれか)力 MeSHタームを抽出タームとし て抽出する(S203)。  When there is unprocessed hit document information, the MeSH term extraction unit 121 extracts the target hit document information (i.e., the unprocessed hit document information) force MeSH term as an extraction term (S203).
次に、 MeSHターム抽出部 121は、抽出した各々の抽出タームについて S205の 処理を行うため、抽出した全抽出タームを処理済み力判定する(S204)。  Next, since the MeSH term extraction unit 121 performs the process of S205 for each extracted term, the extracted total extracted terms are subjected to the processed power determination (S204).
[0079] 未処理の抽出タームがある場合、 MeSHターム抽出部 121は、処理対象の抽出タ ーム (未処理の抽出タームの!/、ずれか)に対応する代表タームを追加して抽出ターム リスト (抽出タームに対応する代表タームの一覧情報)を生成する。また、抽出ターム に対応する代表タームの出現回数を抽出タームリストに設定する。この抽出タームリ ストに設定する出現回数は、ヒット文献のうちで当該 MeSHタームが関連する文献の 数 (以下、ヒット文献中ターム毎文献数とする)を示す。そこで、 MeSHターム抽出部 1 21は、処理対象の抽出タームを出力してターム DB192にアクセスする。図 4に示す ように MeSHターム情報を記憶するターム DB192は、 MeSHターム抽出部 121の出 力した抽出タームを検索キーとして、代表タームまたは同義タームが検索キーと一致 する MeSHターム情報 (エントリ)を検索し、代表タームまたは同義タームが検索キー と一致する MeSHターム情報を検索結果として出力する。 MeSHターム抽出部 121 は、ターム DB192の出力した MeSHターム情報の示す代表ターム(以下、処理対象 の代表タームとする)と生成済みの抽出タームリスト中の代表タームとを比較し、処理 対象の代表タームと同じ代表タームが抽出タームリストに既に存在する力判定する。 処理対象と同じ代表タームが抽出タームリストに存在しない場合、 MeSHターム抽出 部 121は、処理対象の代表タームを抽出タームリストに追加すると共に、追加した処 理対象の代表タームに対応付けてヒット文献中ターム毎文献数「1」を設定する。また 、処理対象と同じ代表タームが存在する場合、 MeSHターム抽出部 121は、処理対 象の代表タームを抽出タームリストに追加せず、処理対象と同じ代表タームに対応付 けて記憶されたヒット文献中ターム毎文献数を 1インクリメントする。また、ターム DB19 2が複数の MeSHターム情報を出力した場合、上記説明の処理により、各 MeSHタ ーム情報の示す代表タームについて、 MeSHターム抽出部 121は代表タームとヒット 文献中ターム毎文献数とを抽出タームリストに設定する。そして、 S204を再び処理す る(S205)。 [0079] When there is an unprocessed extraction term, the MeSH term extraction unit 121 adds a representative term corresponding to the extraction term to be processed (an unprocessed extraction term! /, Or a deviation) and adds an extraction term. Generate a list (list information of representative terms corresponding to the extracted terms). In addition, the number of appearances of the representative terms corresponding to the extracted terms is set in the extracted term list. The number of appearances set in this extracted term list indicates the number of documents related to the MeSH term among the hit documents (hereinafter referred to as the number of documents per term in the hit document). Therefore, the MeSH term extraction unit 121 outputs the extraction term to be processed and accesses the term DB 192. As shown in Figure 4, the term DB192 that stores MeSH term information uses the extracted term output from the MeSH term extraction unit 121 as a search key, and the MeSH term information (entry) whose representative term or synonymous term matches the search key. Searches and outputs MeSH term information whose representative term or synonymous term matches the search key as a search result. The MeSH term extraction unit 121 compares the representative term indicated by the MeSH term information output by the term DB 192 (hereinafter referred to as the representative term of the processing target) with the representative term in the generated extracted term list, and represents the representative of the processing target. The power is determined if the same representative term as the term already exists in the extracted term list. If the same representative term as the processing target does not exist in the extracted term list, the MeSH term extraction unit 121 adds the representative term to be processed to the extracted term list and associates it with the added representative term of the processing target. Set the number of documents per medium term “1”. Also If there is a representative term that is the same as the processing target, the MeSH term extraction unit 121 does not add the processing target representative term to the extraction term list, and stores hit documents stored in association with the same representative term as the processing target. Increment the number of documents per medium term. In addition, when the term DB192 2 outputs multiple MeSH term information, the MeSH term extraction unit 121 determines the number of documents per term in the reference term and hit for the representative term indicated by each MeSH term information. And are set in the extraction term list. Then, S204 is processed again (S205).
S204において、未処理の抽出タームが無い (抽出した全抽出タームが処理済みで ある)場合、 MeSHターム抽出部 121は S202を再び処理する。  In S204, when there is no unprocessed extraction term (all extracted extraction terms have been processed), the MeSH term extraction unit 121 processes S202 again.
また、 S202において、未処理のヒット文献情報が無い(取得した全ヒット文献情報 が処理済みである)場合、 MeSHターム抽出部 121は MeSHターム抽出処理を終了 する。  In S202, if there is no unprocessed hit document information (all acquired hit document information has been processed), the MeSH term extraction unit 121 ends the MeSH term extraction process.
[0080] 図 15は、実施の形態 1における付加情報生成部 122の付加情報生成処理の流れ を示すフローチャートである。  FIG. 15 is a flowchart showing a flow of additional information generation processing of the additional information generation unit 122 in the first embodiment.
検索結果生成部 120の検索結果生成処理 (S 103)における付加情報生成部 122 の付加情報生成処理について、図 15に基づいて以下に説明する。  The additional information generation processing of the additional information generation unit 122 in the search result generation processing (S 103) of the search result generation unit 120 will be described below based on FIG.
[0081] 付加情報生成部 122は、 MeSHターム抽出部 121の生成した抽出タームリストに 設定された全代表タームについて S302〜S304の処理を行うため、抽出タームリスト に設定された全代表タームを処理済みか判定する(S301)。 [0081] The additional information generating unit 122 processes all the representative terms set in the extracted term list in order to perform the processing of S302 to S304 for all the representative terms set in the extracted term list generated by the MeSH term extracting unit 121. It is determined whether or not (S301).
未処理の代表タームがある場合、付加情報生成部 122は、処理対象の代表ターム (未処理の代表タームのいずれ力)を出力してターム DB192にアクセスする。図 4に 示すように MeSHターム情報を記憶するターム DB192は、付加情報生成部 122の 出力した代表タームを検索キーとして、代表タームが検索キーと一致する MeSHタ ーム情報 (エントリ)を検索し、代表タームが検索キーと一致する MeSHターム情報を 検索結果として出力する。そして、付加情報生成部 122は、ターム DB192の出力し た MeSHターム情報の示すタームの説明を、処理対象の代表タームに対応付けて 抽出タームリストに設定する(S302)。  If there is an unprocessed representative term, the additional information generation unit 122 accesses the term DB 192 by outputting a representative term to be processed (any of the unprocessed representative terms). As shown in FIG. 4, the term DB 192 that stores MeSH term information searches the MeSH term information (entry) for which the representative term matches the search key, using the representative term output from the additional information generating unit 122 as a search key. , MeSH term information whose representative term matches the search key is output as the search result. Then, the additional information generating unit 122 sets the description of the term indicated by the MeSH term information output from the term DB 192 in the extracted term list in association with the representative term to be processed (S302).
次に、付加情報生成部 122の分類取得部 123は、ターム DB192の出力した MeS Hターム情報の示す分類 (付加情報の一例)を、処理対象の代表タームに対応付け て抽出タームリストに設定する分類取得処理を行う(S303)。 Next, the classification acquisition unit 123 of the additional information generation unit 122 performs MeS output from the term DB 192. A classification acquisition process is performed in which the classification indicated by the H-term information (an example of additional information) is set in the extracted term list in association with the representative term to be processed (S303).
また、付加情報生成部 122のタームスコア算出部 124は、処理対象の代表ターム のタームスコアを算出し、算出したタームスコア (付加情報の一例)を、処理対象の代 表タームに対応付けて抽出タームリストに設定するタームスコア算出処理を行う(S30 4)。  The term score calculation unit 124 of the additional information generation unit 122 calculates a term score of the representative term to be processed, and extracts the calculated term score (an example of additional information) in association with the representative term of the processing target. A term score calculation process set in the term list is performed (S304).
そして、 S301において、未処理の代表タームが無い場合、付加情報生成部 122は 、付加情報の設定された抽出タームリストを検索結果情報として検索結果記憶部 19 3に記憶する(S305)。  If there is no unprocessed representative term in S301, the additional information generation unit 122 stores the extracted term list in which the additional information is set in the search result storage unit 193 as search result information (S305).
[0082] 図 16は、実施の形態 1におけるタームスコア算出部 124のタームスコア算出処理の 流れを示すフローチャートである。  FIG. 16 is a flowchart showing the flow of term score calculation processing of the term score calculation unit 124 in the first embodiment.
タームスコア算出部 124のタームスコア算出処理(S304)の詳細について、図 16に 基づいて以下に説明する。  Details of the term score calculation process (S304) of the term score calculation unit 124 will be described below with reference to FIG.
[0083] まず、タームスコア算出部 124は、以下の式により、処理対象の代表ターム tのター ムスコア S (t)を算出する(S401)。 First, the term score calculation unit 124 calculates the term score S (t) of the representative term t to be processed using the following formula (S401).
[0084] [数 1] [0084] [Equation 1]
S ) = log2 = log2 Η (ή - log2 Μ (ή S) = log 2 = log 2 Η (ή-log 2 Μ (ή
Μ (ή = -Μ (ή =-
! - ¾ ! -¾
ヒット文献中ターム毎文献数  Number of documents per term in hit documents
Ρ = ヒット文献数  Ρ = number of hit documents
. 全文献中ターム毎文献数 Number of documents per term in all documents
' MEDLINEの全文献数  '' Total number of documents in MEDLINE
[0085] 上記の式で用いる MEDLINEの全文献数とヒット文献数は検索結果記憶部 193に 記憶された MEDLINE結果情報力も取得する。また、全文献中ターム毎文献数は S 302においてターム DB192の出力した MeSHターム情報から取得する。また、ヒット 文献中ターム毎文献数は抽出タームリストから取得する。 The total number of MEDLINE documents and the number of hit documents used in the above formula also acquire the MEDLINE result information power stored in the search result storage unit 193. The number of documents per term in all documents is obtained from MeSH term information output by the term DB 192 in S302. Also hit The number of documents for each term in the document is obtained from the extracted term list.
上記の式では、ヒット文献情報における処理対象の代表タームの密度 (pt)を基準 分布、全文献情報における処理対象の代表タームの密度 (qt)を計測分布としてォッ ズを算出し、対数ォッズ比(Lod [Log Of Odds])をタームスコアとして算出してい る。但し、 "ォッズ"はスコアリングの一例であり、他のスコアリングによりタームスコアを 算出してもよい。  In the above formula, the odds are calculated using the density (pt) of the representative terms to be processed in the hit literature information as the standard distribution and the density (qt) of the representative terms to be processed in all the literature information as the measured distribution. The ratio (Lod [Log Of Odds]) is calculated as the term score. However, “odds” is an example of scoring, and the term score may be calculated by other scoring.
そのため、 MEDLINE200の全文献情報における当該 MeSHタームの出現頻度 が低ぐかつヒット文献情報における当該 MeSHタームの出現頻度が高いほど、当該 MeSHタームのタームスコアは高くなる。つまり、タームスコアが高いほど当該 MeSH タームが重要であることを示す。  Therefore, the lower the appearance frequency of the MeSH term in all the MEDLINE200 document information and the higher the appearance frequency of the MeSH term in the hit document information, the higher the term score of the MeSH term. In other words, the higher the term score, the more important the MeSH term is.
[0086] 次に、付加情報生成部 122のタームスコア算出部 124は、仮算出したタームスコア を調整する(S402〜S404)。  [0086] Next, the term score calculation unit 124 of the additional information generation unit 122 adjusts the provisionally calculated term score (S402 to S404).
[0087] S402ではヒット文献数と特定の第 1の閾値とを比較し、 S403では処理対象の代表 タームのヒット文献中ターム毎文献数と特定の第 2の閾値とを比較する。そして、 S40 4において、ヒット文献数が第 1の閾値未満であり且つ処理対象の代表タームのヒット 文献中ターム毎文献数が第 2の閾値未満である処理対象の代表タームのタームスコ ァを「一∞」に変更する。ここで、ヒット文献数についての第 1の閾値とヒット文献中タ ーム毎文献数についての第 2の閾値とはあら力じめ文献検索装置 100の記憶機器に 記憶しておくものとする。また、第 1の閾値や第 2の閾値を複数設けて複数の条件に よりタームスコアを算出してもよい。例えば、ヒット文献中ターム毎文献数についての 第 3の閾値を設け、上記の条件に加えて、ヒット文献数が第 1の閾値以上であり且つ 処理対象の代表タームのヒット文献中ターム毎文献数が第 3の閾値未満である場合 に、処理対象の代表タームのタームスコアを「一∞」に変更してもよい。また、タームス コアの調整は上記条件に当てはまる場合のタームスコアを「一∞」にすることに限らな い。例えば、上記条件に当てはまる場合のタームスコアに対して、特定の割合での縮 小や特定の値の減算をしたり、上記条件に当てはまらない場合のタームスコアに対し て、特定の割合での増大や特定の値の加算をしてもょ 、。  In S402, the number of hit documents is compared with a specific first threshold value, and in S403, the number of documents per term in the hit documents of the representative term to be processed is compared with a specific second threshold value. In S404, the term score of the representative term of the processing target whose number of hit documents is less than the first threshold and the number of documents per term in the reference document is less than the second threshold is set to “one”. Change to “∞”. Here, it is assumed that the first threshold for the number of hit documents and the second threshold for the number of documents for each term in the hit documents are preliminarily stored in the storage device of the document search apparatus 100. A plurality of first threshold values and second threshold values may be provided, and the term score may be calculated according to a plurality of conditions. For example, a third threshold is set for the number of documents per term in hit documents, and in addition to the above conditions, the number of documents per term in the hit documents of the representative term to be processed when the number of hit documents is greater than or equal to the first threshold If is less than the third threshold, the term score of the representative term to be processed may be changed to “one ∞”. The term core adjustment is not limited to setting the term score to “one ∞” when the above conditions are met. For example, the term score when the above conditions are met can be reduced or subtracted from a specific value, or the term score can be increased at a certain rate when the above conditions are not met. Or add a specific value.
[0088] S401で算出するタームスコアについて、ヒット文献数が少ないために高くなつたタ 一ムスコアは必ずしも当該 MeSHタームが重要であることを示さない。特に、ヒット文 献数が少なくヒット文献中ターム毎文献数も少ない当該 MeSHタームは重要ではな いを考えられる。そこで、 S402〜S404において、重要でないと考えられる当該 MeS[0088] The term score calculated in S401 is high because the number of hit documents is small. A score does not necessarily indicate that the MeSH term is important. In particular, the MeSH term, which has a small number of hit documents and a small number of documents in each hit document, is not considered important. Therefore, in S402 to S404, the MeS considered to be unimportant
Hタームのタームスコアを低くする調整をする。 Adjust to lower the H term term score.
[0089] そして、タームスコア算出部 124は、タームスコアを処理対象の代表タームに対応 付けて抽出タームリストに設定する(S405)。 Then, the term score calculation unit 124 sets the term score in the extracted term list in association with the representative term to be processed (S405).
[0090] 次に、検索結果表示部 105の検索結果表示処理 (S 104)について説明する。 Next, the search result display process (S 104) of the search result display unit 105 will be described.
検索結果表示部 105は、検索結果記憶部 193から抽出タームリスト (検索結果情報 The search result display unit 105 extracts the term list (search result information from the search result storage unit 193).
)を取得し、例えば図 17に示すような、検索結果画面の結果 (4)を表示する。 ) And display the result (4) on the search result screen as shown in FIG.
[0091] 図 17は、実施の形態 1における検索結果画面の結果 (4)を示す図である。 FIG. 17 is a diagram showing a result (4) of the search result screen according to the first embodiment.
検索結果画面のサブメニュー(3)で抽出タームの分類表示 (GDCT)をユーザに指 定された場合の検索結果画面の結果 (4)について、図 17に基づいて以下に説明す る。  The result (4) of the search result screen when the user specifies the extracted term classification display (GDCT) in the submenu (3) of the search result screen will be described below with reference to FIG.
[0092] 図 17において、 [1]はチェックボックス、 [2]はタームスコア(Score)、 [3]はヒット文 献中ターム毎文献数 (Frequency) , [4]は代表ターム (Heading)、 [5]は説明(De scription)、 [6]は OMIMアイコン、 [7]は PubMedアイコンである。  [0092] In Fig. 17, [1] is a check box, [2] is a term score (Score), [3] is the number of documents in each hit document (Frequency), [4] is a representative term (Heading), [5] is a description, [6] is an OMIM icon, and [7] is a PubMed icon.
[0093] 検索結果表示部 105は、分類取得部 123が抽出し抽出リストに設定した各抽出タ ームの分類に従い、抽出タームについて上記の項目([1]〜[7])を分類別に表示す る。図 17は、遺伝子 (Gene)に分類する抽出タームの表示を示している。  [0093] The search result display unit 105 displays the above items ([1] to [7]) for each extracted term according to the classification of each extracted term extracted by the classification obtaining unit 123 and set in the extraction list. The Fig. 17 shows the display of extraction terms classified into genes (Gene).
[0094] チェックボックス [1]は、絞り込み検索の検索条件としてタエリーに追加する MeSH タームを選択するチェックボックスである。  [0094] The check box [1] is a check box for selecting a MeSH term to be added to the tale as a search condition for the narrow search.
クエリー生成部 115は、ユーザがマウス 903 (入力機器の一例)を使用してチェック ボックスにチヱックした代表ターム [4]を AND (論理積)条件または OR (論理和)条件 または NOT (否定)条件で追加したクエリーを生成する。  The query generation unit 115 performs an AND (logical product) condition, an OR (logical sum) condition, or a NOT (negative) condition on the representative term [4] checked by the user using a mouse 903 (an example of an input device). Generate the query added in.
[0095] タームスコア (Score) [2]はタームスコアを示し、ヒット文献中ターム毎文献数(Freq uency) [3]はヒット文献中ターム毎文献数を示し、代表ターム (Heading) [4]は代 表タームを示し、説明(Description) [5]は代表ターム [4]の説明を示す。  [0095] The term score (Score) [2] indicates the term score, the number of documents per term in the hit document (Freq uency) [3] indicates the number of documents per term in the hit document, and the representative term (Heading) [4] Indicates the representative term, and Description [5] describes the representative term [4].
[0096] OMIMアイコン [6]は当該 MeSHターム(代表ターム [4] )を検索条件とする OMI Mの検索結果ページの表示を指定するアイコンで、 PubMedアイコン [7]は当該 Me SHターム(代表ターム [4] )を検索条件とする PubMedの検索結果ページの表示を 指定するアイコンである。 [0096] The OMIM icon [6] is an OMI that uses the MeSH term (representative term [4]) as a search condition. The icon that designates the display of the search result page of M. PubMed icon [7] is the icon that designates the display of the search result page of PubMed that uses the Me SH term (representative term [4]) as a search condition.
クエリー生成部 115は、ユーザがマウス 903 (入力機器の一例)を使用して OMIM アイコン [6]または PubMedアイコン [7]を押下した際、 MEDLINEへのアクセスに 使用した今回のクエリーに当該 MeSHタームを AND条件で追カ卩したクエリーを生成 する。次に、 MEDLINEアクセス部 116は、押下されたアイコンに対応する検索ぺー ジ(OMIMまたは PubMed)にアクセスする URL (クエリーを含む)を生成しインター ネットを介してアクセスし、クエリー生成部 115の生成したクエリーに対応する検索結 果ページの情報をインターネットを介して取得する。そして、検索結果表示部 105は 、 MEDLINEアクセス部 116の取得した情報に基づいて、インターネット上で OMIM または PubMedが提供する検索結果ページを表示する。  When the user presses the OMIM icon [6] or PubMed icon [7] with the mouse 903 (an example of an input device) using the mouse 903, the query generation unit 115 responds to the current query used to access MEDLINE. Generate a query that is added by AND condition. Next, the MEDLINE access unit 116 generates a URL (including a query) for accessing the search page (OMIM or PubMed) corresponding to the pressed icon, accesses it via the Internet, and generates the query generation unit 115. The search result page information corresponding to the query is obtained via the Internet. Based on the information acquired by the MEDLINE access unit 116, the search result display unit 105 displays a search result page provided by OMIM or PubMed on the Internet.
[0097] 検索結果画面の結果 (4)にお 、て、ユーザがマウス 903 (入力機器の一例)を使用 してタームスコア [2] (Score)、ヒット文献中ターム毎文献数 [3] (Frequency)、代表 ターム [4] (Heading)の各項目名を押下した場合、表示変更部 103は、押下された 項目(タームスコア、ヒット文献中ターム毎文献数、代表ターム)の情報 (表示指定情 報)に応じて検索結果情報をソートし、検索結果表示部 105は、ソートされた検索結 果情報 (表示変更情報)に基づ!、て検索結果の表示を変更する。  [0097] In the result (4) of the search result screen, the user uses the mouse 903 (an example of an input device) to obtain a term score [2] (Score), the number of documents per term in the hit document [3] ( When the name of each item of (Frequency) and representative term [4] (Heading) is pressed, the display changing unit 103 displays information on the pressed items (term score, number of documents per term in hit documents, representative term) (display designation) The search result information is sorted according to the information), and the search result display unit 105 changes the display of the search result based on the sorted search result information (display change information).
[0098] 上記実施の形態 1において、抽出ターム (抽出タームに対応する代表タームも意味 するものとする)と共に、付加情報 (分類、タームスコア)を、文献情報を絞り込むため の情報として出力できることについて説明した。  [0098] In Embodiment 1 above, it is possible to output additional information (classification, term score) together with extraction terms (which also means representative terms corresponding to extraction terms) as information for narrowing down literature information. explained.
[0099] また、付加情報を付加して抽出タームを表示し、検索結果画面の結果 (4)のチヱッ クボックス [1]を指定することにより、当該抽出タームを検索条件に追加することがで きることについて説明した。つまり、文献情報を絞り込むために検索条件を容易に追 加することができることについて説明した。  [0099] In addition, the extraction terms can be added to the search condition by adding the additional information to display the extraction terms and specifying the check box [1] in the result (4) on the search result screen. I explained what I can do. In other words, it was explained that the search condition can be easily added to narrow down the literature information.
[0100] また、上記分類指標番号 (特定の MeSH番号)に従 、、 MeSHタームを分類するこ とを特徴とすることを説明した。  [0100] Furthermore, it has been explained that the MeSH terms are classified according to the classification index number (specific MeSH number).
[0101] また、 MEDLINE200の全文献情報における当該 MeSHタームの出現頻度が低く 、かつヒット文献情報における当該 MeSHタームの出現頻度が高いほどタームスコア を高くすることを特徴とすることを説明した。 [0101] In addition, the frequency of occurrence of the MeSH term in all MEDLINE200 document information is low. In addition, we explained that the higher the frequency of occurrence of the MeSH term in hit literature information, the higher the term score.
[0102] また、ヒット文献数が少ないヒット文献中ターム毎文献数も少ない当該 MeSHターム のタームスコアを低くすることを特徴とすることを説明した。 [0102] Furthermore, it has been explained that the MeSH term has a low term score in the hit document with a small number of hit documents and the number of documents per term is small.
[0103] 上記実施の形態 1において、 MeSHターム抽出部 121が抽出タームに対応する代 表タームではなく抽出ターム自体を示す抽出タームリストを生成し、付加情報生成部 122が抽出ターム自体を示す抽出タームリストに基づいて付加情報を生成してもよい 。つまり、抽出ターム自体とその付加情報とを出力してもよい。但し、代表タームは M eSHタームを一意に識別するため、抽出タームに対応する代表タームとその付加情 報とを出力することで、より適切な MeSHタームを絞り込み検索の検索条件としてュ 一ザに指定させることができる。  [0103] In Embodiment 1 above, the MeSH term extraction unit 121 generates an extraction term list indicating the extraction terms themselves, not the representative terms corresponding to the extraction terms, and the additional information generation unit 122 extracts the extraction terms themselves. Additional information may be generated based on the term list. That is, the extraction term itself and its additional information may be output. However, since the representative term uniquely identifies the MeSH term, the representative term corresponding to the extracted term and its additional information are output, so that a more appropriate MeSH term can be narrowed down to the user as a search condition. Can be specified.
[0104] また、上記実施の形態 1にお!/、て、文献情報は MEDLINE200の収録する文献情 報でなくても構わず、また、文献情報は医学関連文献の情報でなくても構わない。  [0104] Also, in the first embodiment, the document information may not be the document information recorded in MEDLINE 200, and the document information may not be the information of the medical-related document. .
[0105] また、上記実施の形態 1において、文献用語は MeSHタームでなくても構わず、ま た、文献用語は医学関連用語の情報でなくても構わな!/、。  [0105] In the first embodiment, the document term may not be a MeSH term, and the document term may not be information on medical terms! /.
[0106] また、上記実施の形態 1において説明した分類を検索条件としてタエリーに指定し てもよい。つまり、分類を特定して MeSHタームを検索条件に指定することで、指定し た MeSHタームを同義タームとする複数の代表タームが存在する場合でも、特定の 分類の代表タームのみを検索条件にすることができる。このことは、より適切な MeSH タームを絞り込み検索の検索条件としてユーザに指定させることができることを意味 する。この場合、指定されたクエリーに含まれる MeSHタームを代表タームまたは同 義タームとする代表タームであり、クエリーで指定された特定の分類の代表タームを、 クエリー生成部 115がターム DB192から取得し、取得した代表タームを当該 MeSH タームと置き換えてクエリーを生成する。  [0106] Further, the classification described in the first embodiment may be designated as a search condition for the category. In other words, by specifying the classification and specifying the MeSH term as a search condition, even if there are multiple representative terms that have the specified MeSH term as a synonym term, only the representative terms of a specific classification are used as the search condition. be able to. This means that the user can specify a more appropriate MeSH term as a search condition for a refined search. In this case, the MeSH terms included in the specified query are representative terms or synonymous terms, and the query generation unit 115 acquires the representative terms of the specific classification specified in the query from the term DB 192, The query is generated by replacing the obtained representative term with the MeSH term.
[0107] また、上記実施の形態 1において、抽出タームリストへの「タームの説明」の設定 (S 302)、「分類」の設定 (S303)を抽出タームリストの生成時 (S205)に行ってもょ 、。 これにより、ターム DB192へのアクセスが減り、処理時間が短くなる。  [0107] In the first embodiment, the "term description" setting (S302) and the "classification" setting (S303) in the extracted term list are performed when the extracted term list is generated (S205). Well ... This reduces access to the term DB192 and reduces processing time.
[0108] 上記説明のように、上記実施の形態 1により、有用な文献情報を取得するための絞 り込み文献検索を支援することができることを説明した。 [0108] As described above, according to the first embodiment, the limitation for acquiring useful document information is described. We explained that it is possible to assist in-line literature search.
[0109] 実施の形態 2.  [0109] Embodiment 2.
文献情報を絞り込むための情報として、ヒットした文献情報に含まれる MeSHターム について、 MeSH番号と共に、分類、タームスコアを出力する形態について以下に 説明する。ここで、前記実施の形態 1と異なる部分について説明し、その他の部分に っ 、ては前記実施の形態 1と同様であるものとする。  As information for narrowing down the literature information, the following explains how to output the classification and term score along with the MeSH number for MeSH terms included in the hit literature information. Here, the parts different from the first embodiment will be described, and the other parts shall be the same as those of the first embodiment.
[0110] 前記実施の形態 1で説明した文献検索処理での S303において、付加情報生成部 122の MeSH番号取得部 125力 MeSH番号取得処理を行い、抽出タームの代表 タームに対応する MeSH番号を付加情報として検索結果を出力する。 MeSH番号 取得処理では、ターム DB192の出力した MeSHターム情報の示す MeSH番号(付 加情報の一例)を処理対象の代表タームに対応付けて抽出タームリストに設定する。 実施の形態 2における文献検索装置 100の他の処理および構成は前記実施の形 態 1と同様である。  [0110] In S303 in the document search process described in the first embodiment, the MeSH number acquisition unit of the additional information generation unit 122 performs the MeSH number acquisition process and adds the MeSH number corresponding to the representative term of the extracted term. The search result is output as information. In the MeSH number acquisition process, the MeSH number (example of additional information) indicated by the MeSH term information output by the term DB 192 is set in the extracted term list in association with the representative term to be processed. Other processes and configurations of the document retrieval apparatus 100 in the second embodiment are the same as those in the first embodiment.
[0111] 図 18は、実施の形態 2における検索結果画面の結果 (4)を示す図である。  FIG. 18 is a diagram showing a result (4) of the search result screen in the second embodiment.
検索結果画面のサブメニュー(3)で抽出タームの MeSH番号表示(MeSH)をュ 一ザに指定された場合の検索結果画面の結果 (4)について、図 18に基づいて以下 に説明する。  The result (4) of the search result screen when the extraction term MeSH number display (MeSH) is specified as the user in the submenu (3) of the search result screen is described below with reference to FIG.
[0112] 図 18において、 [1]は表示する MeSHタームの階層を選択するリストボックスであり 、 [2]は MeSH番号を示し、 [3]は MeSHの検索結果ページの表示を指定する Me SHアイコンである。その他の画面構成については前記実施の形態 1と同様である。  [0112] In FIG. 18, [1] is a list box for selecting a hierarchy of MeSH terms to be displayed, [2] is a MeSH number, and [3] is a display of MeSH search result page Me SH Icon. Other screen configurations are the same as those in the first embodiment.
[0113] 図 18において、表示変更部 103は、ユーザがマウス 903 (入力機器の一例)を使用 してリストボックス [1]で表示する MeSHタームの階層を指定されると、指定された階 層(表示変更情報)に対応する抽出タームについての抽出タームリスト (検索結果情 報)を出力する。検索結果表示部 105は、表示変更部 103から出力された検索結果 情報 (表示変更情報)に基づ!、て検索結果の表示を変更する。  [0113] In FIG. 18, the display changing unit 103 uses the mouse 903 (an example of an input device) to specify the MeSH term hierarchy to be displayed in the list box [1]. Outputs the extraction term list (search result information) for the extraction terms corresponding to (display change information). The search result display unit 105 changes the display of the search result based on the search result information (display change information) output from the display change unit 103.
このとき、表示変更部 103は、リストボックスで MeSH番号が指定された場合、指定 された MeSH番号で、抽出タームリストに設定された各抽出タームの MeSH番号を 前方一致で比較し、前方一致した抽出タームにっ 、ての抽出タームリストを出力する 。指定された MeSH番号が前方一致した抽出タームは、指定された階層および下位 層の抽出タームを示す。また、表示変更部 103は、指定された MeSH番号と完全一 致する抽出タームについて抽出タームリストを出力してもよい。指定された MeSH番 号と完全一致する抽出タームは、指定された階層の抽出タームのみを示す。 At this time, when the MeSH number is specified in the list box, the display changing unit 103 compares the MeSH number of each extraction term set in the extraction term list with the specified MeSH number by the forward match, and the forward match is obtained. Output a list of extracted terms by extraction terms . The extracted terms with the specified MeSH number that matches the prefix indicate the extracted terms in the specified hierarchy and lower layers. In addition, the display change unit 103 may output an extraction term list for extraction terms that completely match the specified MeSH number. The extracted terms that exactly match the specified MeSH number indicate only the extracted terms in the specified hierarchy.
また、リストボックスで MeSHタームが指定された場合、表示変更部 103は、指定さ れた MeSHタームの MeSH番号をターム DB192または抽出タームリストから取得し て、リストボックスで MeSH番号が指定された場合と同様にして指定された階層に対 応する抽出タームリストを出力する。  When the MeSH term is specified in the list box, the display change unit 103 acquires the MeSH number of the specified MeSH term from the term DB192 or the extracted term list, and the MeSH number is specified in the list box. The extracted term list corresponding to the specified hierarchy is output in the same way as.
[0114] また、 MEDLINEアクセス部 116は、ユーザがマウス 903 (入力機器の一例)を使 用して MeSHアイコン [3]を押下した際、当該 MeSH番号 [2]をクエリーとして URL ( クエリーを含む)を生成し、インターネット上の MeSHデータベースの検索ページにァ クセスし、検索結果ページの情報を取得する。そして、検索結果表示部 105は、 ME DLINEアクセス部 116の取得した情報に基づ!/、て、インターネット上で MeSHデー タベースが提供する検索結果ページを表示する。  [0114] Further, when the user presses the MeSH icon [3] using the mouse 903 (an example of an input device), the MEDLINE access unit 116 uses the MeSH number [2] as a query (including the query). ), Access the search page of the MeSH database on the Internet, and obtain the search result page information. Then, the search result display unit 105 displays a search result page provided by the MeSH database on the Internet based on the information acquired by the MEDLINE access unit 116! /.
[0115] 上記実施の形態 2において、ヒット文献情報に含まれる MeSHタームのうち指定さ れた階層の MeSHタームを医学文献情報を絞り込むための情報として出力できるこ とを説明した。これにより、有用な文献情報を取得するための絞り込み文献検索を支 援することができる。  [0115] In the second embodiment, it has been described that MeSH terms in a specified hierarchy among MeSH terms included in hit document information can be output as information for narrowing down medical document information. As a result, it is possible to support a narrowed-down document search for obtaining useful document information.
[0116] その他、上記実施の形態 2においても、有用な文献情報を取得するための絞り込 み文献検索を支援することができることについては、前記実施の形態 1と同様である  [0116] In addition, in Embodiment 2 described above, it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported.
[0117] 実施の形態 3. [0117] Embodiment 3.
今回抽出した文献用語の内で過去に抽出した文献用語と同じ文献用語の数を検 索推移情報として出力する形態について以下に説明する。ここで、前記実施の形態 1と異なる部分について説明し、その他の部分については前記実施の形態 1と同様 であるものとする。  The following describes how the number of the same document terms extracted in the past among the document terms extracted this time is output as search transition information. Here, parts different from those of the first embodiment will be described, and the other parts shall be the same as those of the first embodiment.
[0118] 前記実施の形態 1で説明した文献検索処理での S103において、検索推移情報生 成部 126が検索推移情報生成処理を行い、検索推移についての検索結果を生成す る。 [0118] In S103 in the document search process described in the first embodiment, the search transition information generation unit 126 performs a search transition information generation process and generates a search result for the search transition. The
また、文献検索処理での S104において、検索結果表示部 105は、検索推移情報 生成部 126の生成した検索推移の情報 (検索結果情報)を検索結果記憶部 193から 取得し、以下で説明する検索結果画面の履歴 (5)を表示する。  In S104 in the document search process, the search result display unit 105 acquires the search transition information (search result information) generated by the search transition information generation unit 126 from the search result storage unit 193, and performs the search described below. Display the result screen history (5).
また、検索結果生成部 120は、生成した検索結果情報を MEDLINE結果情報と検 索の実行日時と検索に使用したクエリーとに対応付けて検索結果記憶部 193に記憶 する。  In addition, the search result generation unit 120 stores the generated search result information in the search result storage unit 193 in association with the MEDLINE result information, the search execution date and time, and the query used for the search.
実施の形態 3における文献検索装置 100の他の処理および構成は前記実施の形 態 1と同様である。  Other processes and configuration of the document retrieval apparatus 100 in the third embodiment are the same as those in the first embodiment.
[0119] 以下の説明にお!/、て、 MEDLINE結果情報と検索結果情報と併せて結果情報と する。  [0119] In the following explanation, the result information is combined with MEDLINE result information and search result information.
[0120] 実施の形態 3における検索結果生成部 120の検索結果生成処理 (S 103)につい て説明する。  [0120] The search result generation process (S103) of the search result generation unit 120 in the third embodiment will be described.
まず、前記実施の形態 1と同様に、 MeSHターム抽出部 121が MeSHターム抽出 処理を行う。  First, as in the first embodiment, the MeSH term extraction unit 121 performs MeSH term extraction processing.
そして、検索結果生成部 120の検索推移情報生成部 126が、今回の結果情報およ び過去の結果情報に基づいて検索推移情報を生成する検索推移情報生成処理を 行う。  Then, the search transition information generation unit 126 of the search result generation unit 120 performs search transition information generation processing for generating search transition information based on the current result information and past result information.
[0121] 図 19は、実施の形態 3における検索推移情報生成部 126の検索推移情報生成処 理の流れを示すフローチャートである。  FIG. 19 is a flowchart showing the flow of search transition information generation processing of the search transition information generation unit 126 in the third embodiment.
検索結果生成部 120の検索結果生成処理 (S103)における検索推移情報生成部 126の検索推移情報生成処理について、図 19に基づいて以下に説明する。  The search transition information generation process of the search transition information generation unit 126 in the search result generation process (S103) of the search result generation unit 120 will be described below with reference to FIG.
[0122] まず、検索推移情報生成部 126は、検索結果記憶部 193から過去 (前回)の結果 情報を取得する。また、今回の MEDLINE結果情報も取得する(S501)。 First, the search transition information generation unit 126 acquires past (previous) result information from the search result storage unit 193. Also, the current MEDLINE result information is acquired (S501).
次に、検索推移情報生成部 126は、今回の(MEDLINE結果情報に含まれる)全 ヒット文献情報について S503の処理を行うため、今回の全ヒット文献情報を処理済 みか判定する(S 502)。  Next, the search transition information generation unit 126 determines whether or not the current all hit document information has been processed in order to perform the process of S503 for the current all hit document information (included in the MEDLINE result information) (S502).
未処理の今回のヒット文献情報がある場合、処理対象の今回のヒット文献情報 (未 処理のヒット文献情報の 、ずれか)を過去のヒット文献情報と比較し、過去の 、ずれ 力のヒット文献情報と重複する処理対象の今回のヒット文献情報を追加して重複文献 リストを生成する。そして、 S502を再び処理する(S503)。 If there is unprocessed current hit document information, the current hit document information to be processed (unprocessed) The hit document information of the process is compared with the past hit document information, and the current hit document information to be processed that overlaps the past hit document information of the misalignment force is added to generate a duplicate document list . Then, S502 is processed again (S503).
未処理の今回のヒット文献情報が無 、 (今回の全ヒット文献情報が処理済みである) 場合、検索推移情報生成部 126は、検索結果情報に含まれる抽出タームリストの示 す抽出タームに対応する今回の全代表タームについて S505の処理を行うため、今 回の全代表タームを処理済み力判定する(S504)。  If there is no unprocessed hit document information of this time (all hit document information of this time has been processed), the search transition information generator 126 supports the extracted terms indicated by the extracted term list included in the search result information In order to perform the processing of S505 for all the representative terms of this time, the processing power of all the representative terms of this time is judged (S504).
未処理の今回の代表タームがある場合、処理対象の今回の代表ターム (未処理の 今回の代表タームのいずれ力 )を過去の抽出タームに対応する代表タームと比較し、 過去のいずれかの代表タームと重複する処理対象の今回の代表タームを追加して 重複タームリストを生成する。そして、 S504を再び処理する(S505)。  If there is an unprocessed representative term, the current representative term to be processed (the power of any unprocessed representative term) is compared with the representative term corresponding to the past extracted term. Add a representative term for the current process that overlaps with a term to generate a duplicate term list. Then, S504 is processed again (S505).
また、未処理の今回の代表タームがない (今回の全代表タームが処理済みである) 場合、検索推移情報生成部 126は、生成した重複文献リストと重複タームリストとを検 索結果情報として検索結果記憶部 193に記憶する。この際、検索推移情報生成部 1 26は、検索結果情報を MEDLINE結果情報と検索の実行日時と検索に使用したク エリーとに対応付けて記憶する(S506)。  If there is no unprocessed representative term (all current representative terms have been processed), the search transition information generation unit 126 searches the generated duplicate document list and duplicate term list as search result information. Store in the result storage unit 193. At this time, the search transition information generation unit 126 stores the search result information in association with the MEDLINE result information, the search execution date and time, and the query used for the search (S506).
[0123] 図 20、図 21は、実施の形態 3における検索推移情報の概念図である。 20 and 21 are conceptual diagrams of search transition information in the third embodiment.
上記の検索推移情報生成処理 (S501〜S506)により、検索推移情報生成部 126 は、図 20に示すように、今回の結果情報 (結果情報 1)と前回の結果情報 (結果情報 2)とで重複する情報 (ヒット文献、抽出タームに対応する代表ターム)を検索推移情 報として生成している。  As a result of the above-described search transition information generation processing (S501 to S506), the search transition information generation unit 126 generates the current result information (result information 1) and the previous result information (result information 2) as shown in FIG. Duplicate information (hit documents, representative terms corresponding to extracted terms) is generated as search transition information.
また、検索推移情報生成部 126は、図 21に示すように、今回の結果情報 (結果情 報 1)と過去の複数の結果情報 (結果情報 2、結果情報 3)とで重複する情報を検索推 移情報として生成してもよ 、。  Further, as shown in FIG. 21, the search transition information generation unit 126 searches for information that overlaps between the current result information (result information 1) and a plurality of past result information (result information 2, result information 3). It may be generated as transition information.
この場合、検索推移情報生成部 126は、上記の検索推移情報生成処理 (S501〜 S506)において、前回生成した検索推移情報を過去の結果情報として扱い、今回の 検索推移情報を生成すればょ 、。  In this case, the search transition information generation unit 126 treats the previously generated search transition information as past result information in the search transition information generation process (S501 to S506), and generates the current search transition information. .
[0124] 図 22は、実施の形態 3における検索結果画面の履歴(5)を示す図である。 図 22において、 [1]はクリアボタン、 [2]はヒストリ番号 (No)、 [3]は実行日時 (Tim e)ゝ [4]はクエリー(Query)ゝ [5]はヒット文献数 (Article— Num)、 [6]はオーバー ラップ文献数(Article— Overlap) , [7]は抽出ターム数(Term—Num)、 [8]はォ 一バーラップターム数 (Term- Overlap)である。 FIG. 22 is a diagram showing a history (5) of the search result screen in the third embodiment. In Fig. 22, [1] is the clear button, [2] is the history number (No), [3] is the execution date (Time) ゝ [4] is the query (Query) ゝ [5] is the number of hit documents (Article — Num), [6] is the number of overlapping documents (Article—Overlap), [7] is the number of extracted terms (Term—Num), and [8] is the number of overlapping terms (Term-Overlap).
[0125] クリアボタン [1]は、過去の結果情報の削除を指定するためのボタンである。 [0125] The clear button [1] is a button for designating deletion of past result information.
文献検索装置 100は、ユーザがマウス 903 (入力機器の一例)を使用してクリアボタ ンを押下した際、検索結果記憶部 193に記憶された過去の結果情報を削除する。  When the user presses the clear button using the mouse 903 (an example of an input device), the document search apparatus 100 deletes past result information stored in the search result storage unit 193.
[0126] ヒストリ番号 [2]は、今回および過去の検索を識別する番号を示す。 [0126] History number [2] indicates a number for identifying current and past searches.
検索結果表示部 105は、実行日時が新しいものから昇順にヒストリ番号を付加する  The search result display unit 105 adds history numbers in ascending order from the latest execution date and time.
[0127] 実行日時 [3]は検索の実行日時を示し、クエリー [4]は各検索において使用したク エリーを示し、ヒット文献数 [5]は各検索におけるヒット文献数を示し、オーバーラップ 文献数 [6]は重複文献リストの文献数を示し、抽出ターム数 [7]は抽出タームに対応 する代表ターム数を示し、オーバーラップターム数 [8]は重複タームリストの代表ター ム数を示す。 [0127] Execution date and time [3] indicates the search execution date and time, query [4] indicates the query used in each search, number of hit documents [5] indicates the number of hit documents in each search, and overlapping documents The number [6] indicates the number of documents in the duplicate document list, the number of extracted terms [7] indicates the number of representative terms corresponding to the extracted terms, and the number of overlap terms [8] indicates the number of representative terms in the duplicate term list .
[0128] 上記実施の形態 3において、今回抽出した文献用語 (または文献)の内で過去に抽 出した文献用語 (または文献)と同じ文献用語 (または文献)の数を検索推移情報とし て出力できることを説明した。つまり、今回抽出した文献用語 (または文献)の数に対 して検索推移情報の示す文献用語 (または文献)の数が極端に少なくなつた場合に は誤った検索条件で絞り込み文献検索が行われたと考えられる、今回抽出した文献 用語 (または文献)の数に対して検索推移情報の示す文献用語 (または文献)の数が 適度に少なくなつた場合には有効な検索条件で絞り込み文献検索が行われたと考え られる、などの検索推移の状況をユーザに認識させることができる。これにより、上記 実施の形態 3により、有用な文献情報を取得するための絞り込み文献検索を支援す ることがでさる。  [0128] In the third embodiment, the number of the same document terms (or documents) as the document terms (or documents) extracted in the past among the currently extracted document terms (or documents) is output as the search transition information. I explained what I can do. In other words, if the number of document terms (or documents) indicated by the search transition information is extremely small compared to the number of document terms (or documents) extracted this time, a narrowed-down document search is performed with an incorrect search condition. If the number of document terms (or documents) indicated by the search transition information is reasonably small relative to the number of document terms (or documents) extracted this time, a refined document search is performed using effective search conditions. It is possible to make the user recognize the status of search transitions such as thought to have been lost. Thus, the narrowed-down document search for obtaining useful document information can be supported by the third embodiment.
[0129] その他、上記実施の形態 3においても、有用な文献情報を取得するための絞り込 み文献検索を支援することができることについては、前記実施の形態 1と同様である [0130] 実施の形態 4. [0129] In addition, in Embodiment 3 described above, it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported. [0130] Embodiment 4.
ユーザに指定されたタームを同義語とする代表タームを取得し、取得した代表ター ムのうち指定された代表タームを検索条件とするタエリーで文献情報を検索する形態 について以下に説明する。ここで、前記の各実施の形態と異なる部分について説明 し、その他の部分については前記の各実施の形態と同様であるものとする。  A mode in which a representative term having the term specified by the user as a synonym is acquired, and literature information is searched with a reference using the specified representative term as a search condition among the acquired representative terms will be described below. Here, the parts different from the respective embodiments will be described, and the other parts shall be the same as those of the respective embodiments.
[0131] 図 23は、実施の形態 4における文献検索部 110の文献検索処理の流れを示すフ ローチャートである。 検索結果画面のメインメニュー(1)でシノニム検索機能(Dictionary)をユーザに 指定された場合の検索結果生成部 120の文献検索処理 (S 102)について、図 23に 基づいて以下に説明する。  FIG. 23 is a flowchart showing the flow of document search processing of document search unit 110 in the fourth embodiment. The document search process (S102) of the search result generation unit 120 when the synonym search function (Dictionary) is designated by the user in the main menu (1) of the search result screen will be described below with reference to FIG.
[0132] 文献検索処理 (S 102)において、まず、シノニム検索部 111がユーザの指定したタ ーム (検索ターム)に対応する代表タームを出力して検索に使用するターム (指定タ ーム)をユーザに指定させるシノニム検索処理を行い(S601〜S603)、次に、クエリ 一生成部 115が指定タームを AND条件または OR条件とするクエリーを生成するク エリー生成処理を行い(S604)、そして、 MEDLINEアクセス部 116がクエリー生成 部 115の生成したクエリーで MEDLINE200にアクセスしヒット文献情報を取得する MEDLINEアクセス処理を行う(S605)。  [0132] In the document search process (S102), first, the synonym search unit 111 outputs a representative term corresponding to the term specified by the user (search term) and is used for the search (specified term). A synonym search process is performed to allow the user to specify (S601 to S603), and then the query generation unit 115 performs a query generation process to generate a query using the specified term as an AND condition or an OR condition (S604), and Then, the MEDLINE access unit 116 performs MEDLINE access processing for accessing the MEDLINE 200 using the query generated by the query generation unit 115 and acquiring hit document information (S605).
[0133] 次に、シノニム検索部 111のシノニム検索処理(S601〜S603)の詳細について説 明する。  Next, details of the synonym search processing (S601 to S603) of the synonym search unit 111 will be described.
S601において、シノニム検索部 111の検索ターム入力部 112は、ユーザの指定し たタームを検索タームとして入力機器力も入力する。例えば、ユーザがキーボード 90 2 (入力機器の一例)から任意のタームを入力し、また例えば、ユーザの指定したター ムが設定されたファイルを記憶機器 (入力機器の一例)カゝら取得しファイルカゝらターム を入力する。タームの数は 1つでも複数でも構わない。  In S601, the search term input unit 112 of the synonym search unit 111 also inputs the input device power using the term specified by the user as the search term. For example, a user inputs an arbitrary term from the keyboard 90 2 (an example of an input device), and for example, a file in which a term specified by the user is set is obtained from a storage device (an example of an input device) Enter a custom term. The number of terms can be one or more.
S602において、シノニム検索部 111の代表ターム取得部 113は、検索ターム入力 部 112の入力した検索タームでターム DB192にアクセスし、検索タームに対応する 代表タームを取得する。このとき、ターム DB192は、検索タームと代表タームまたは 同義タームが検索タームと一致する MeSHターム情報 (エントリ)を検索し、検索ター ムと代表タームまたは同義タームが検索タームと一致したエントリの代表タームを代 表ターム取得部 113に出力する。 In S602, the representative term acquisition unit 113 of the synonym search unit 111 accesses the term DB 192 with the search term input by the search term input unit 112, and acquires a representative term corresponding to the search term. At this time, the term DB 192 searches for MeSH term information (entry) in which the search term and the representative term or the synonym term match the search term, and the search term. The representative term of the entry in which the search term and the synonymous term or the synonymous term match the search term is output to the representative term acquisition unit 113.
S603において、シノニム検索部 111の指定ターム取得部 114は、代表ターム取得 部 113の取得した代表タームを検索結果表示部 105に出力し、検索結果表示部 10 In S603, the designated term acquisition unit 114 of the synonym search unit 111 outputs the representative term acquired by the representative term acquisition unit 113 to the search result display unit 105, and the search result display unit 10
5は、例えば図 24に示すような、検索結果画面の結果 (4)を表示する。 5 displays the result (4) of the search result screen as shown in FIG. 24, for example.
[0134] 図 24は、実施の形態 4における検索結果画面の結果 (4)を示す図である。 FIG. 24 is a diagram showing a result (4) of the search result screen in the fourth embodiment.
図 24において、 [1]は検索ターム (Input)、 [2]はチヱックボックス、 [3]は代表ター ム(Heading)である。  In FIG. 24, [1] is a search term (Input), [2] is a check box, and [3] is a representative term (Heading).
[0135] 検索ターム [1]は検索タームを示し、代表ターム [3]は代表タームを示す。 [0135] Search term [1] indicates a search term, and representative term [3] indicates a representative term.
[0136] チェックボックス [2]は、検索条件とする MeSHターム(指定ターム)を選択するチェ ックボックスである。 [0136] Check box [2] is a check box for selecting a MeSH term (specified term) as a search condition.
指定ターム取得部 114は、ユーザがマウス 903 (入力機器の一例)を使用してチェ ックボックスにチヱックした代表ターム [3]を指定タームとして取得する。  The specified term acquisition unit 114 acquires the representative term [3] checked by the user in the check box using the mouse 903 (an example of an input device) as the specified term.
[0137] 上記実施の形態 4において、ユーザの指定タームを同義タームとする代表タームを 取得し、さらに、取得した代表タームのうちユーザに指定された代表タームで対応す る文献情報を検索できることを説明した。 MeSHタームとして複数の同義タームが一 般に用いられているため、ユーザの指定したタームを検索条件として全ての必要な 文献情報を取得できるとは限らない。つまり、上記実施の形態 4により、より適切な Me SHタームを絞り込み検索の検索条件としてユーザに指定させることができ、有用な 文献情報を取得するための絞り込み文献検索を支援することができる。このことは、 マイクロアレイにより発現した多数の遺伝子について文献情報を調べたい場合など、 多数のタームを検索条件として文献情報を一括検索したいときに特に有効である。 [0137] In Embodiment 4 described above, it is possible to acquire a representative term having a user-specified term as a synonymous term, and to search for literature information corresponding to the representative term specified by the user among the acquired representative terms. explained. Since multiple synonymous terms are commonly used as MeSH terms, it is not always possible to acquire all necessary document information using search terms specified by the user. That is, according to the fourth embodiment, it is possible to allow the user to specify a more appropriate Me SH term as a search condition for a narrow search, and it is possible to support a narrowed document search for acquiring useful document information. This is particularly effective when you want to search for literature information using a large number of terms as search conditions, such as when you want to examine literature information for many genes expressed by a microarray.
[0138] その他、上記実施の形態 4においても、有用な文献情報を取得するための絞り込 み文献検索を支援することができることについては、前記実施の形態 1と同様である  [0138] In addition, in Embodiment 4 described above, it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported.
[0139] 実施の形態 5. [0139] Embodiment 5.
広義検索方式で文献情報を検索する形態について以下に説明する。ここで、前記 の各実施の形態と異なる部分について説明し、その他の部分については前記の各 実施の形態と同様であるものとする。 The form which searches literature information by a broad search system is demonstrated below. Here, parts different from the above embodiments will be described, and other parts will be described. It is the same as the embodiment.
[0140] ここで、広義検索方式とは、検索文の含む MeSHタームと検索文の含む MeSHタ ームの下位層の MeSHタームとを検索条件とする検索方式である。また、狭義検索 方式とは検索文に含まれる MeSHタームを検索条件とする検索方式である。  [0140] Here, the broad search method is a search method that uses a MeSH term included in a search sentence and a MeSH term in a lower layer of the MeSH term included in the search sentence as search conditions. The narrow-sense search method is a search method that uses MeSH terms included in the search text as search conditions.
検索方式は、例えば、ユーザ情報 DB191にユーザ情報として記憶する。そして、 文献検索部 110は、 MEDLINE200にアクセスする際に、ユーザ情報 DB191に記 憶された検索方式に従い、以下のような処理を行う。  For example, the search method is stored in the user information DB 191 as user information. Then, when accessing the MEDLINE 200, the document search unit 110 performs the following processing according to the search method stored in the user information DB 191.
[0141] 広義検索方式が指定されている場合、上記の各実施の形態で MEDLINEァクセ ス部 116が MEDLINE200にアクセスしたクエリーについて、クエリー生成部 115は 当該 MeSHタームの下位層の MeSHタームを条件に加えたクエリーを生成する。 M EDLINEアクセス部 116はクエリー生成部 115の生成した当該 MeSHタームの下位 層の MeSHタームを条件に加えたクエリーで MEDLINE200にアクセスし MEDLI NE結果情報を取得する。  [0141] When the broad search method is specified, for the query that the MEDLINE access unit 116 accesses MEDLINE 200 in each of the above embodiments, the query generation unit 115 uses the MeSH term in the lower layer of the MeSH term as a condition. Generate the added query. The MEDLINE access unit 116 accesses the MEDLINE 200 with a query in which the MeSH term in the lower layer of the MeSH term generated by the query generation unit 115 is added as a condition, and acquires MEDLINE result information.
[0142] この場合、クエリー生成部 115はクエリー内の各タームを出力してターム DB192に アクセスする。ターム DB192は、クエリー生成部 115の出力した各タームと一致する MeSHタームの MeSHターム番号を検索する。そして、ターム DB192は、クエリー 生成部 115の出力した各タームと一致する MeSHタームの MeSHターム番号と前方 一致する MeSHターム番号を検索し、前方一致した MeSH番号の MeSHタームをク エリー生成部 115に出力する。そして、クエリー生成部 115は、ターム DB192の出力 した MeSHタームを OR条件で設定したクエリーを生成する。 MEDLINEアクセス部 116はクエリー生成部 115の生成したクエリ一で MEDLINE200にアクセスし MED LINE結果情報を取得する。  [0142] In this case, the query generation unit 115 outputs each term in the query and accesses the term DB 192. The term DB 192 searches for the MeSH term number of the MeSH term that matches each term output by the query generation unit 115. Then, the term DB 192 searches for the MeSH term number that matches the MeSH term number of the MeSH term that matches each term output from the query generation unit 115 and matches the MeSH term of the MeSH number that matches the prefix to the query generation unit 115. Output. Then, the query generation unit 115 generates a query in which the MeSH term output from the term DB 192 is set with an OR condition. The MEDLINE access unit 116 accesses the MEDLINE 200 with one query generated by the query generation unit 115 and acquires MED LINE result information.
[0143] 上記実施の形態 5において、狭義検索と広義検索とのうち指定された検索方式で 文献情報を検索し、ヒット文献情報に含まれる MeSHタームを医学文献情報を絞り込 むための情報として出力できることを説明した。広義検索の指定により、ユーザの指 定した検索タームの下位層のタームに関連する文献情報も取得できる。つまり、ユー ザの指定した検索タームに関連する文献情報を網羅して取得することができことによ り、有用な文献情報を取得するための絞り込み文献検索を支援することができる。 [0144] その他、上記実施の形態 5においても、有用な文献情報を取得するための絞り込 み文献検索を支援することができることについては、前記実施の形態 1と同様である [0143] In the fifth embodiment, it is possible to search for literature information using a search method specified between narrow search and broad search, and to output MeSH terms included in hit literature information as information for narrowing down medical literature information. Explained. By specifying the broad search, it is possible to obtain document information related to the terms below the search terms specified by the user. In other words, since it is possible to comprehensively acquire document information related to a search term designated by the user, it is possible to support a narrowed-down document search for acquiring useful document information. [0144] In addition, in Embodiment 5 described above, it is the same as in Embodiment 1 that the narrowed-down document search for obtaining useful document information can be supported.
[0145] 実施の形態 6. [0145] Embodiment 6.
MEDLINE200の収録する文献情報の更新時の形態について以下に説明する。 ここで、前記の各実施の形態と異なる部分について説明し、その他の部分について は前記の各実施の形態と同様であるものとする。  The form at the time of updating the literature information recorded in MEDLINE 200 will be described below. Here, the parts different from the respective embodiments will be described, and the other parts shall be the same as those of the respective embodiments.
[0146] MEDLINE更新情報取得部 102は、 NLMの FTPサイトに定期的にアクセスして MEDLINE200の更新情報を取得して、更新情報に基づいてターム DB192の各 M eSHターム情報 (エントリ)の全文献中ターム毎文献数を更新する。このとき、 MeSH ターム抽出部 121は、更新情報の示す文献情報をヒット文献として扱い、上記実施の 形態 1と同様にして抽出タームリストを生成する。そして、 MEDLINE更新情報取得 部 102は抽出タームリストに設定された代表タームとヒット文献中ターム毎文献数とを ターム DB192に出力し、ターム DB192は MEDLINE更新情報取得部 102の出力 した代表タームと同じ代表タームと対応付けて記憶されている全文献中ターム毎文 献数に MEDLINE更新情報取得部 102の出力したヒット文献中ターム毎文献数を 加算する。これにより、 MEDLINE更新情報取得部 102は、更新情報に基づいてタ ーム DB192の各 MeSHターム情報 (エントリ)の全文献中ターム毎文献数を更新す る。  [0146] The MEDLINE update information acquisition unit 102 periodically accesses the NLM FTP site to acquire MEDLINE200 update information, and based on the update information, all documents of each MeSH term information (entry) in the term DB192 Update the number of documents for each medium term. At this time, the MeSH term extraction unit 121 treats the document information indicated by the update information as a hit document, and generates an extracted term list in the same manner as in the first embodiment. Then, the MEDLINE update information acquisition unit 102 outputs the representative terms set in the extracted term list and the number of documents in each hit document to the term DB 192. The term DB 192 is the same as the representative term output by the MEDLINE update information acquisition unit 102. The number of documents per term in the hit document output from the MEDLINE update information acquisition unit 102 is added to the number of documents per term in all documents stored in association with the representative term. As a result, the MEDLINE update information acquisition unit 102 updates the number of documents per term in all documents in each MeSH term information (entry) of the term DB 192 based on the update information.
[0147] また、 MEDLINE更新情報取得部 102は、 MEDLINE200の文献情報の更新に ついてユーザに通知を行う。このとき、ユーザ情報 DB191にはユーザのメールァドレ スをユーザ情報として記憶しておくものとし、 MEDLINE更新情報取得部 102はュ 一ザ情報 DB 191からメールアドレスを取得して取得したメールアドレス宛てに MED LINE200の文献情報の更新を知らせるメールを送信する。さらに、ユーザが定めた クエリーをユーザ情報として、 MEDLINE更新情報取得部 102力 MEDLINE200 の更新情報を取得時にユーザ情報のクエリーで MEDLINE200にアクセスし、検索 結果をユーザに通知してもよ 、。  [0147] Further, the MEDLINE update information acquisition unit 102 notifies the user about the update of the MEDLINE 200 document information. At this time, the user information DB 191 stores the user's mail address as user information, and the MEDLINE update information acquisition unit 102 acquires the mail address from the user information DB 191 and sends it to the mail address acquired. Send an email notifying you of updates to LINE200 bibliographic information. Furthermore, the query determined by the user may be used as user information, and the MEDLINE update information acquisition unit 102 may update the MEDLINE200 to access the MEDLINE200 by querying the user information and notify the user of the search result.
[0148] また、 MEDLINE200の文献情報の更新の通知先のメールアドレスやメール以外 での通知方法の指定、ユーザが定めたクエリー、前記実施の形態 1で説明したター ムスコアを調整する閾値、前記実施の形態 2で説明したリストボックス(図 18 [1])に設 定する MeSHタームの階層、前記実施の形態 5で説明した広義検索と狭義検索との 検索方式の指定やユーザのパスワードの変更など、ユーザ情報の変更を行えるよう にするとよい。 [0148] In addition, email addresses other than the MEDLINE200 bibliographic information notification destination and emails Designation of notification method in user, user-defined query, threshold for adjusting term score explained in the first embodiment, MeSH set in the list box explained in the second embodiment (Fig. 18 [1]) It is desirable to be able to change user information, such as specifying the term hierarchy, the search method of the broad search and narrow search described in the fifth embodiment, and changing the user password.
例えば、ユーザ環境設定部 101は、 CRT表示装置 901 (表示装置の表示装置の 一例)にメールアドレス、ユーザが定めたクエリー、タームスコアの閾値、 MeSHター ムの階層リストや検索方式などの環境設定情報およびパスワードを指定するための チェックボックスやテキストボックスなどを表示し、ユーザに指定された情報でユーザ 情報 DB 191のユーザ情報を更新するとよ!/、。  For example, the user environment setting unit 101 sets an environment setting such as an e-mail address, user-defined query, term score threshold, MeSH term hierarchy list, and search method on the CRT display device 901 (an example of the display device display device). Display the check box and text box for specifying information and password, and update the user information in the user information DB 191 with the information specified by the user! /.
[0149] 上記実施の形態 6において、 MEDLINE200の文献情報の更新に対応できること を説明した。つまり、 MEDLINE200に新たな文献情報が追加されても、前記実施 の形態 1で説明したタームスコアを算出できる。  [0149] In Embodiment 6 described above, it has been explained that it is possible to cope with the update of the MEDLINE200 document information. That is, even if new document information is added to MEDLINE 200, the term score described in the first embodiment can be calculated.
[0150] また、 MEDLINE200の文献情報の更新をユーザに通知できることを説明した。こ れにより、 MEDLINE200に新たに追加された文献情報に対する文献検索を促すこ とがでさる。  [0150] Furthermore, it has been explained that the user can be notified of an update of the literature information of MEDLINE200. As a result, it is possible to promote a document search for document information newly added to MEDLINE200.
[0151] また、 MEDLINE200の更新の通知先、検索結果の通知を行うタエリー、前記実 施の形態 1で説明したタームスコアを調整する閾値、前記実施の形態 2で説明したリ ストボックスに設定する MeSHタームの階層、前記実施の形態 5で説明した検索方式 やログイン時のパスワードなどを変更できることを説明した。これにより、ユーザの利 便性を向上することができる。  [0151] Also, the notification destination of MEDLINE 200, the territory for notifying search results, the threshold value for adjusting the term score described in the first embodiment, and the list box described in the second embodiment are set. It has been explained that the hierarchy of MeSH terms, the search method described in Embodiment 5 and the password for login can be changed. As a result, user convenience can be improved.
[0152] 上記説明のように、上記実施の形態 6により、有用な文献情報を取得するための絞 り込み文献検索を支援することができることを説明した。 [0152] As described above, it has been described that the narrowed-down document search for obtaining useful document information can be supported by the sixth embodiment.
図面の簡単な説明  Brief Description of Drawings
[0153] [図 1]実施の形態 1における文献検索装置 100の構成図。 FIG. 1 is a configuration diagram of a document search apparatus 100 according to Embodiment 1.
[図 2]実施の形態 1における文献検索部 110の構成図。  FIG. 2 is a configuration diagram of a document search unit 110 in the first embodiment.
[図 3]実施の形態 1における検索結果生成部 120の構成図。  FIG. 3 is a configuration diagram of a search result generation unit 120 according to Embodiment 1.
[図 4]実施の形態 1におけるターム DB192の記憶管理する MeSHターム情報のテー ブルを示す図。 [FIG. 4] Term of MeSH term information managed and stored in term DB192 in Embodiment 1. The figure which shows a bull.
[図 5]MeSH Descritorのツリー図。  [Figure 5] MeSH Descritor tree diagram.
[図 6]MeSH Descritorのツリー図。 [Figure 6] MeSH Descritor tree diagram.
[図 7]MeSH Descritorのツリー図。 [Figure 7] Tree diagram of MeSH Descritor.
[図 8]MeSH Descritorのツリー図。 [Figure 8] Tree diagram of MeSH Descritor.
[図 9]MeSH Descritorのツリー図。 [Figure 9] MeSH Descritor tree diagram.
[図 10]実施の形態 1における検索結果表示部 105が検索結果を表示した画面 (検索 結果画面)を示す図。  FIG. 10 is a diagram showing a screen (search result screen) on which search result display section 105 in Embodiment 1 displays a search result.
圆 11]実施の形態 1における文献検索装置 100の外観を示す図。 [11] FIG. 11 is a diagram showing an appearance of the document search device 100 according to the first embodiment.
圆 12]実施の形態 1における文献検索装置 100のハードウ ア構成図。 12] Hardware configuration diagram of the document search device 100 according to the first embodiment.
[図 13]実施の形態 1における文献検索装置 100の文献検索処理の流れを示すフロ 一チャート。  FIG. 13 is a flowchart showing the flow of a document search process of the document search apparatus 100 in the first embodiment.
[図 14]実施の形態 1における MeSHターム抽出部 121の MeSHターム抽出処理の 流れを示すフローチャート。  FIG. 14 is a flowchart showing a flow of MeSH term extraction processing of the MeSH term extraction unit 121 in the first embodiment.
圆 15]実施の形態 1における付加情報生成部 122の付加情報生成処理の流れを示 すフローチャート。 15] A flowchart showing a flow of additional information generation processing of the additional information generation unit 122 in the first embodiment.
[図 16]実施の形態 1におけるタームスコア算出部 124のタームスコア算出処理の流れ を示すフローチャート。  FIG. 16 is a flowchart showing the flow of term score calculation processing of the term score calculation unit 124 in the first embodiment.
[図 17]実施の形態 1における検索結果画面の結果 (4)を示す図。  FIG. 17 is a diagram showing a result (4) of the search result screen in the first embodiment.
[図 18]実施の形態 2における検索結果画面の結果 (4)を示す図。  FIG. 18 shows a result (4) of the search result screen in the second embodiment.
圆 19]実施の形態 3における検索推移情報生成部 126の検索推移情報生成処理の 流れを示すフローチャート。 圆 19] A flowchart showing the flow of search transition information generation processing of the search transition information generation unit 126 in the third embodiment.
圆 20]実施の形態 3における検索推移情報の概念図。 20) A conceptual diagram of search transition information in the third embodiment.
圆 21]実施の形態 3における検索推移情報の概念図。 21] A conceptual diagram of search transition information in the third embodiment.
[図 22]実施の形態 3における検索結果画面の履歴(5)を示す図。  FIG. 22 is a diagram showing a history (5) of a search result screen in the third embodiment.
[図 23]実施の形態 4における文献検索部 110の文献検索処理の流れを示すフロー チャート。  FIG. 23 is a flowchart showing the flow of a document search process of the document search unit 110 in the fourth embodiment.
[図 24]実施の形態 4における検索結果画面の結果 (4)を示す図。 符号の説明 FIG. 24 is a diagram showing a result (4) of the search result screen in the fourth embodiment. Explanation of symbols
100 文献検索装置、 101 ユーザ環境設定部、 102 MEDLINE更新情報取得 部、 103 表示変更部、 104 ログイン部、 105 検索結果表示部、 110 文献検索 部、 111 シノニム検索部、 112 検索ターム入力部、 113 代表ターム取得部、 114 指定ターム取得部、 115 クエリー生成部、 116 MEDLINEアクセス部、 120 検 索結果生成部、 121 MeSHターム抽出部、 122 付加情報生成部、 123 分類取 得部、 124 タームスコア算出部、 125 MeSH番号取得部、 126 検索推移情報生 成部、 191 ユーザ情報 DB、 192 ターム DB、 193 検索結果記憶部、 200 MED LINE, 210 OMIM、 220 MeSH, 901 CRT表示装置、 902 Κ/Β, 903 マ ウス、 904 FDD, 905 CDD、 906 プリンタ装置、 907 スキャナ装置、 910 シス テムユニット、 911 CPU, 912 ノ ス、 913 ROM, 914 RAM, 915 通信ボード 、 920 磁気ディスク装置、 921 OS、 922 ウィンドウシステム、 923 プログラム群、 924 ファイル群、 931 電話器、 932 FAX機、 940 インターネット、 941 ウェブ サーバ、 942 LAN0 100 document search device, 101 user environment setting unit, 102 MEDLINE update information acquisition unit, 103 display change unit, 104 login unit, 105 search result display unit, 110 document search unit, 111 synonym search unit, 112 search term input unit, 113 Representative term acquisition unit, 114 Designated term acquisition unit, 115 Query generation unit, 116 MEDLINE access unit, 120 Search result generation unit, 121 MeSH term extraction unit, 122 Additional information generation unit, 123 Classification acquisition unit, 124 Term score calculation , 125 MeSH number acquisition unit, 126 search transition information generation unit, 191 user information DB, 192 term DB, 193 search result storage unit, 200 MED LINE, 210 OMIM, 220 MeSH, 901 CRT display device, 902 Κ / Β , 903 mouse, 904 FDD, 905 CDD, 906 printer device, 907 scanner device, 910 system unit, 911 CPU, 912 node, 913 ROM, 914 RAM, 915 communication board, 920 magnetic disk device, 921 OS, 922 Window system, 923 programs, 924 files, 931 telephones, 932 FAX machines, 940 Internet, 941 Web server, 942 LAN 0

Claims

請求の範囲 The scope of the claims
[1] シソーラスタームを含む文献情報の集合を記憶する文献データベースと各シソーラ スタームを遺伝子と疾患と化合物と組織との分類に対応付けて記憶する分類データ ベースとに接続し、文献を検索する文献検索装置であり、  [1] A literature database that stores a collection of literature information including a thesaurus raster, and a classification database that stores each thesaurus solar in association with the classification of genes, diseases, compounds, and tissues, and retrieves the literature. A search device,
検索文を入力機器から入力する検索文入力部と、  A search text input unit for inputting a search text from an input device;
前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、  A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;
前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出 するシソーラスターム抽出部と、  A thesaurus raster extracting unit for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit;
前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれについて、各シ ソーラスタームの分類を前記分類データベースから取得する分類取得部と、 前記シソーラスターム抽出部の抽出したシソーラスタームと前記分類取得部の取得 した当該分類とを対応付けて出力機器に出力する出力部と  For each thesaurus raster extracted by the thesaurus raster extraction unit, a classification acquisition unit that acquires the classification of each thesaurus raster from the classification database, the thesaurus raster extracted by the thesaurus raster extraction unit, and the classification acquisition unit An output unit that associates the classification with the output to the output device;
を備えたことを特徴とする文献検索装置。  A document retrieval apparatus comprising:
[2] シソーラスタームを含む文献情報の集合を記憶する文献データベースと、遺伝子と 疾患とィ匕合物と組織との分類でありシソーラスタームの階層構造に基づく分類に対応 付けて各シソーラスタームを記憶する分類データベースとに接続し、文献を検索する 文献検索装置であり、  [2] Bibliographic database that stores a collection of bibliographic information including a thesaurus raster, and the classification of genes, diseases, compounds, and tissues. Each thesaurus raster is stored in association with the classification based on the hierarchical structure of the thesaurus raster. Connected to a classification database to search for documents, a document search device,
検索文を入力機器から入力する検索文入力部と、  A search text input unit for inputting a search text from an input device;
前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、  A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;
前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出 するシソーラスターム抽出部と、  A thesaurus raster extracting unit for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit;
前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれについて、各シ ソーラスタームの分類を前記分類データベースから取得する分類取得部と、 前記シソーラスターム抽出部の抽出したシソーラスタームと前記分類取得部の取得 した当該分類とを対応付けて出力機器に出力する出力部と  For each thesaurus raster extracted by the thesaurus raster extraction unit, a classification acquisition unit that acquires the classification of each thesaurus raster from the classification database, the thesaurus raster extracted by the thesaurus raster extraction unit, and the classification acquisition unit An output unit that associates the classification with the output to the output device;
を備えたことを特徴とする文献検索装置。 [3] シソーラスタームを含む文献情報の集合を記憶する文献データベースに接続し、 文献を検索する文献検索装置であり、 A document retrieval apparatus comprising: [3] A document retrieval device that connects to a document database that stores a collection of document information including a thesaurus raster and retrieves documents.
検索文を入力機器から入力する検索文入力部と、  A search text input unit for inputting a search text from an input device;
前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、  A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;
前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出 するシソーラスターム抽出部と、  A thesaurus raster extracting unit for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit;
前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれについて、文献 情報の集合の文献情報の数における当該シソーラスタームが含まれる文献情報の数 の割合に対する、取得文献情報の数における当該シソーラスタームが含まれる取得 文献情報の数の割合のスコアを算出するタームスコア算出部と、  For each of the thesaurus rasters extracted by the thesaurus raster extraction unit, acquisition including the thesaurus raster in the number of acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information A term score calculator that calculates a score of the ratio of the number of document information;
前記シソーラスターム抽出部の抽出したシソーラスタームと前記タームスコア算出部 の算出した当該スコアとを対応付けて出力機器に出力する出力部と  An output unit for associating the thesaurus raster extracted by the thesaurus raster extraction unit with the score calculated by the term score calculation unit and outputting it to an output device;
を備えたことを特徴とする文献検索装置。  A document retrieval apparatus comprising:
[4] 前記文献検索装置は、さらに、 [4] The literature search device further includes:
取得文献情報の数にっ 、ての第 1の閾値と当該シソーラスタームが含まれる取得 文献情報の数についての第 2の閾値とを記憶する記憶機器を備え、  A storage device for storing the first threshold value and the second threshold value for the number of acquired document information including the thesaurus raster according to the number of acquired document information;
前記タームスコア算出部は、  The term score calculator
前記記憶機器力 第 1の閾値と第 2の閾値とを取得し、第 1の閾値と第 2の閾値と取 得文献情報の数と当該シソーラスタームが含まれる取得文献情報の数とに基づいて 、取得文献情報の数が第 1の閾値未満であり且つ当該シソーラスタームが含まれる 取得文献情報の数が第 2の閾値未満である当該シソーラスタームのスコアを低く算出 する  The storage device power is acquired based on the first threshold value and the second threshold value, and based on the first threshold value, the second threshold value, the number of acquired document information, and the number of acquired document information including the thesaurus raster. The number of acquired document information is less than the first threshold and the thesaurus is included. The score of the thesaurus that has the number of acquired document information less than the second threshold is calculated low.
ことを特徴とする請求項 3記載の文献検索装置。  The document retrieval apparatus according to claim 3, wherein:
[5] シソーラスタームを含む文献情報の集合を記憶する文献データベースに接続し、 文献を検索する文献検索装置であり、 [5] A document retrieval device that connects to a document database that stores a collection of document information including a thesaurus raster and retrieves documents.
検索文を入力機器から入力する検索文入力部と、  A search text input unit for inputting a search text from an input device;
前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、 The document information corresponding to the search sentence input by the search sentence input unit is stored in the document data. A document information acquisition unit to acquire from
前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出 し記憶機器に記憶するシソーラスターム抽出部と、  A thesaurus raster extracting unit that extracts the thesaurus raster included in the acquired document information acquired by the document information acquiring unit and stores it in a storage device;
前記シソーラスターム抽出部が今回抽出したシソーラスタームの内で前記シソーラ スターム抽出部が記憶機器に記憶している過去に抽出したシソーラスタームと同じシ ソーラスタームの数を算出する重複ターム数算出部と、  A duplication term number calculating unit for calculating the number of the same thesaurus rasters as the previously extracted thesaurus rasters stored in the storage device among the thesaurus rasters currently extracted by the thesaurus raster extraction unit;
前記重複ターム数算出部の算出したシソーラスタームの数を出力機器に出力する 出力部と  An output unit for outputting the number of thesaurus rasters calculated by the duplicate term number calculation unit to an output device;
を備えたことを特徴とする文献検索装置。  A document retrieval apparatus comprising:
[6] シソーラスタームを含む文献情報の集合を記憶する文献データベースとシソーラス タームの階層情報を記憶する階層データベースとに接続し、文献を検索する文献検 索装置であり、  [6] A document search device for searching documents by connecting to a document database storing a set of document information including a thesaurus raster and a hierarchy database storing hierarchical information of a thesaurus term.
検索文を入力機器から入力する検索文入力部と、  A search text input unit for inputting a search text from an input device;
前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、  A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;
前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出 するシソーラスターム抽出部と、  A thesaurus raster extracting unit for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit;
前記シソーラスターム抽出部の抽出したシソーラスタームについての階層情報を前 記階層データベース力 取得するシソーラス階層取得部と、  A thesaurus hierarchy acquisition unit for acquiring the hierarchy database power of hierarchy information about the thesaurus raster extracted by the thesaurus raster extraction unit;
シソーラスタームの階層の指定を入力する指定階層入力部と、  A designated hierarchy input section for inputting the designation of the hierarchy of the thesaurus raster;
前記シソーラスターム抽出部の抽出したシソーラスタームのうち前記指定階層入力 部の入力した階層に対応するシソーラスタームを前記シソーラス階層取得部の取得 した階層情報に基づいて選択するシソーラスターム選択部と、  A thesaurus raster selection unit that selects a thesaurus raster corresponding to the hierarchy input by the specified hierarchy input unit among the thesaurus rasters extracted by the thesaurus raster extraction unit based on the hierarchy information acquired by the thesaurus hierarchy acquisition unit;
前記シソーラスターム選択部の選択したシソーラスタームを出力機器に出力する出 力部と  An output unit for outputting the thesaurus raster selected by the thesaurus raster selection unit to an output device;
を備えたことを特徴とする文献検索装置。  A document retrieval apparatus comprising:
[7] シソーラスタームを含む文献情報の集合を記憶する文献データベースに接続し、 文献を検索する文献検索装置であり、 シソーラスタームを含む検索文につ!、て、検索文の含むシソーラスタームを検索条 件とする狭義検索方式と検索文の含むシソーラスタームと検索文の含むシソーラスタ ームの下位層のシソーラスタームとを検索条件とする広義検索方式とのいずれかの 検索方式の指定を入力する検索方式入力部と、 [7] A document retrieval device that connects to a document database that stores a collection of document information including a thesaurus raster and retrieves documents. For search sentences that include a thesaurus raster !, a narrow-sense search method that uses the thesaurus raster included in the search sentence as a search condition, a thesaurus raster included in the search sentence, and a thesaurus raster below the thesaurus raster included in the search sentence. A search method input part for inputting designation of a search method in a broad sense search method using
シソーラスタームを含む検索文を入力機器力 入力する検索文入力部と、 前記検索方式入力部の入力した検索方式での検索条件に従い、前記検索文入力 部の入力した検索文に対応する文献情報を前記文献データベースから取得する文 献情報取得部と、  A search sentence input unit for inputting a search sentence including a thesaurus raster, and literature information corresponding to the search sentence input by the search sentence input unit according to a search condition in the search method input by the search method input unit. A literature information acquisition unit to acquire from the literature database;
前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出 するシソーラスターム抽出部と、  A thesaurus raster extracting unit for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit;
前記シソーラスターム抽出部の抽出したシソーラスタームを出力機器に出力する出 力部と  An output unit for outputting the thesaurus raster extracted by the thesaurus raster extraction unit to an output device;
を備えたことを特徴とする文献検索装置。 A document retrieval apparatus comprising:
シソーラスタームを含む文献情報の集合を記憶する文献データベースとシソーラス タームの同義語を記憶する同義語データベースとに接続し、文献を検索する文献検 索装置であり、  A document search device that connects to a document database that stores a set of document information including a thesaurus raster and a synonym database that stores synonyms of a thesaurus term, and searches for documents.
複数の検索語を入力機器から入力する検索語入力部と、  A search term input unit for inputting a plurality of search terms from an input device;
前記検索語入力部の入力した複数の検索語それぞれを同義語とするそれぞれの シソーラスタームを前記同義語データベースから一括取得して表示装置に表示する 同義ターム取得部と、  A synonym term acquisition unit that collectively acquires each of the thesaurus rasters each having a plurality of search terms input from the search term input unit as synonyms from the synonym database and displays them on a display device;
前記同義ターム取得部が一括取得して表示したシソーラスタームから、検索語とす るシソーラスタームを指定させる検索語指定部と、  A search term designating unit for designating a thesaurus raster as a search term from the thesaurus rasters acquired and displayed by the synonym term obtaining unit at once;
前記検索語指定部により指定されたシソーラスタームを含む検索文を入力機器か ら入力する検索文入力部と、  A search sentence input unit for inputting a search sentence including a thesaurus raster designated by the search word designation part from an input device;
前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、  A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;
前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出 するシソーラスターム抽出部と、 前記シソーラスターム抽出部の抽出したシソーラスタームを出力機器に出力する出 力部と A thesaurus raster extracting unit for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit; An output unit for outputting the thesaurus raster extracted by the thesaurus raster extraction unit to an output device;
を備えたことを特徴とする文献検索装置。  A document retrieval apparatus comprising:
[9] シソーラスタームを含む文献情報の集合を記憶する文献データベースと各シソーラ スタームを遺伝子と疾患と化合物と組織との分類に対応付けて記憶する分類データ ベースとに接続し、文献を検索する文献検索装置の文献検索方法であり、  [9] A document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus solar in association with the classification of genes, diseases, compounds, and tissues, and retrieves the documents. A literature search method for a search device,
検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、 文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、  Literature information in which the retrieval text input section performs retrieval text input processing for inputting a retrieval text from an input device, and the literature information acquisition section obtains literature information corresponding to the retrieval text input by the retrieval text input section from the literature database. Perform the acquisition process,
シソーラスターム抽出部が、前記文献情報取得部の取得した取得文献情報に含ま れるシソーラスタームを抽出するシソーラスターム抽出処理を行い、  The thesaurus raster extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit,
分類取得部力 前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれ につ 、て、各シソーラスタームの分類を前記分類データベース力 取得する分類取 得処理を行い、  Classification acquisition unit power For each of the thesaurus rasters extracted by the thesaurus raster extraction unit, a classification acquisition process for acquiring the classification database power of the classification of each thesaurus raster is performed.
出力部が前記シソーラスターム抽出部の抽出したシソーラスタームと前記分類取得 部の取得した当該分類とを対応付けて出力機器に出力する出力処理を行う ことを特徴とする文献検索方法。  A document retrieval method, wherein the output unit performs an output process in which the thesaurus raster extracted by the thesaurus raster extraction unit and the classification acquired by the classification acquisition unit are associated with each other and output to an output device.
[10] シソーラスタームを含む文献情報の集合を記憶する文献データベースと、遺伝子と 疾患とィ匕合物と組織との分類でありシソーラスタームの階層構造に基づく分類に対応 付けて各シソーラスタームを記憶する分類データベースとに接続し、文献を検索する 文献検索装置の文献検索方法であり、 [10] Bibliographic database that stores a collection of bibliographic information including a thesaurus raster, and a classification of genes, diseases, compounds, and tissues. Each thesaurus raster is stored in association with the classification based on the hierarchical structure of the thesaurus raster. The document search method of the document search device for connecting to the classification database to search and searching for documents,
検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、 文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、  Literature information in which the retrieval text input section performs retrieval text input processing for inputting a retrieval text from an input device, and the literature information acquisition section obtains literature information corresponding to the retrieval text input by the retrieval text input section from the literature database. Perform the acquisition process,
シソーラスターム抽出部が、前記文献情報取得部の取得した取得文献情報に含ま れるシソーラスタームを抽出するシソーラスターム抽出処理を行い、  The thesaurus raster extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit,
分類取得部力 前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれ につ 、て、各シソーラスタームの分類を前記分類データベース力 取得する分類取 得処理を行い、 Classification acquisition unit capability The classification acquisition unit acquires the classification of each thesaurus raster for each of the thesaurus rasters extracted by the thesaurus raster extraction unit. Process
出力部が前記シソーラスターム抽出部の抽出したシソーラスタームと前記分類取得 部の取得した当該分類とを対応付けて出力機器に出力する出力処理を行う ことを特徴とする文献検索方法。  A document retrieval method, wherein the output unit performs an output process in which the thesaurus raster extracted by the thesaurus raster extraction unit and the classification acquired by the classification acquisition unit are associated with each other and output to an output device.
[11] シソーラスタームを含む文献情報の集合を記憶する文献データベースに接続し、 文献を検索する文献検索装置の文献検索方法であり、  [11] A document retrieval method for a document retrieval apparatus that retrieves documents by connecting to a document database that stores a collection of document information including a thesaurus raster.
検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、 文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、  Literature information in which the retrieval text input section performs retrieval text input processing for inputting a retrieval text from an input device, and the literature information acquisition section obtains literature information corresponding to the retrieval text input by the retrieval text input section from the literature database. Perform the acquisition process,
シソーラスターム抽出部が、前記文献情報取得部の取得した取得文献情報に含ま れるシソーラスタームを抽出するシソーラスターム抽出処理を行い、  The thesaurus raster extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit,
タームスコア算出部力 S、前記シソーラスターム抽出部の抽出したシソーラスタームそ れぞれについて、文献情報の集合の文献情報の数における当該シソーラスタームが 含まれる文献情報の数の割合に対する、取得文献情報の数における当該シソーラス タームが含まれる文献情報の数の割合のスコアを算出するタームスコア算出処理を 行い、  For each of the thesaurus rasters extracted by the term score calculation unit S and the thesaurus raster extraction unit, the acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information A term score calculation process for calculating a score of the ratio of the number of document information including the thesaurus term in the number of
出力部が前記シソーラスターム抽出部の抽出したシソーラスタームと前記タームス コア算出部の算出した当該スコアとを対応付けて出力機器に出力する出力処理を行 ことを特徴とする文献検索方法。  A document retrieval method, wherein the output unit performs an output process in which the thesaurus raster extracted by the thesaurus raster extraction unit and the score calculated by the term core calculation unit are associated with each other and output to an output device.
[12] 取得文献情報の数についての第 1の閾値と当該シソーラスタームが含まれる文献 情報の数についての第 2の閾値とを記憶する記憶機器をさらに備える前記文献検索 装置の文献検索方法であり、  [12] The document search method of the document search apparatus, further comprising a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of document information including the thesaurus raster. ,
前記タームスコア算出処理において、  In the term score calculation process,
前記タームスコア算出部が、前記記憶機器力 第 1の閾値と第 2の閾値とを取得し 、第 1の閾値と第 2の閾値と取得文献情報の数と当該シソーラスタームが含まれる取 得文献情報の数とに基づ 、て、取得文献情報の数が第 1の閾値未満であり且つ当 該シソーラスタームが含まれる取得文献情報の数が第 2の閾値未満である当該シソ 一ラスタームのスコアを低く算出する The term score calculation unit acquires the storage device power first threshold value and the second threshold value, and acquires the first threshold value, the second threshold value, the number of acquired document information, and the acquired document including the thesaurus raster. Based on the number of information, the number of acquired document information is less than the first threshold, and the number of acquired document information including the thesaurus raster is less than the second threshold. Calculate a low rasta score
ことを特徴とする請求項 11記載の文献検索方法。  The document search method according to claim 11, wherein:
[13] シソーラスタームを含む文献情報の集合を記憶する文献データベースに接続し、 文献を検索する文献検索装置の文献検索方法であり、 [13] A document retrieval method for a document retrieval device that retrieves documents by connecting to a document database that stores a collection of document information including a thesaurus raster.
検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、 文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、  Literature information in which the retrieval text input section performs retrieval text input processing for inputting a retrieval text from an input device, and the literature information acquisition section obtains literature information corresponding to the retrieval text input by the retrieval text input section from the literature database. Perform the acquisition process,
シソーラスターム抽出部が、前記文献情報取得部の取得した取得文献情報に含ま れるシソーラスタームを抽出し記憶機器に記憶するシソーラスターム抽出処理を行い 重複ターム数算出部が前記シソーラスターム抽出部が今回抽出したシソーラスター ムの内で前記シソーラスターム抽出部が記憶機器に記憶している過去に抽出したシ ソーラスタームと同じシソーラスタームの数を算出する重複ターム数算出処理を行 ヽ  The thesaurus raster extraction unit performs a thesaurus raster extraction process that extracts the thesaurus raster included in the acquired document information acquired by the document information acquisition unit and stores it in the storage device. The duplicate term number calculation unit extracts the current thesaurus raster extraction unit this time. The number of duplicate terms is calculated to calculate the number of thesaurus rasters that have been extracted in the past and stored in the storage device by the thesaurus raster extraction unit.
出力部が、前記重複ターム数算出部の算出したシソーラスタームの数を出力機器 に出力する出力処理を行う The output unit performs output processing for outputting the number of thesaurus rasters calculated by the duplicate term number calculation unit to the output device.
ことを特徴とする文献検索方法。  A document retrieval method characterized by that.
[14] シソーラスタームを含む文献情報の集合を記憶する文献データベースとシソーラス タームの階層情報を記憶する階層データベースとに接続し、文献を検索する文献検 索装置の文献検索方法であり、 [14] A document retrieval method for a document retrieval apparatus that retrieves documents by connecting to a document database that stores a set of document information including a thesaurus raster and a hierarchy database that stores hierarchical information of a thesaurus term.
検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、 文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、  Literature information in which the retrieval text input section performs retrieval text input processing for inputting a retrieval text from an input device, and the literature information acquisition section obtains literature information corresponding to the retrieval text input by the retrieval text input section from the literature database. Perform the acquisition process,
シソーラスターム抽出部が、前記文献情報取得部の取得した取得文献情報に含ま れるシソーラスタームを抽出するシソーラスターム抽出処理を行い、  The thesaurus raster extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit,
シソーラス階層取得部が前記シソーラスターム抽出部の抽出したシソーラスターム についての階層情報を前記階層データベースから取得するシソーラス階層取得処理 を行い、 指定階層入力部がシソーラスタームの階層の指定を入力する指定階層入力処理を 行い、 The thesaurus hierarchy acquisition unit performs a thesaurus hierarchy acquisition process for acquiring hierarchy information about the thesaurus raster extracted by the thesaurus raster extraction unit from the hierarchy database; The specified hierarchy input part performs the specified hierarchy input process to input the specification of the thesaurus raster hierarchy,
シソーラスターム選択部が前記シソーラスターム抽出部の抽出したシソーラスターム のうち前記指定階層入力部の入力した階層に対応するシソーラスタームを前記シソ 一ラス階層取得部の取得した階層情報に基づいて選択するシソーラスターム選択処 理を行い、  A thesaurus in which a thesaurus raster selection unit selects a thesaurus raster corresponding to the hierarchy input by the designated hierarchy input unit from the thesaurus rasters extracted by the thesaurus raster extraction unit based on the hierarchy information acquired by the thesaurus hierarchy acquisition unit Perform the term selection process,
出力部が前記シソーラスターム選択部の選択したシソーラスタームを出力機器に出 力する出力処理を行う  The output unit performs output processing for outputting the thesaurus raster selected by the thesaurus raster selection unit to an output device.
ことを特徴とする文献検索方法。  A document retrieval method characterized by that.
[15] シソーラスタームを含む文献情報の集合を記憶する文献データベースに接続し、 文献を検索する文献検索装置の文献検索方法であり、 [15] A document retrieval method for a document retrieval device for retrieving a document by connecting to a document database storing a set of document information including a thesaurus raster,
検索方式入力部が、シソーラスタームを含む検索文ついて、検索文の含むシソーラ スタームを検索条件とする狭義検索方式と検索文の含むシソーラスタームと検索文 の含むシソーラスタームの下位層のシソーラスタームとを検索条件とする広義検索方 式との ヽずれかの検索方式の指定を入力する検索方式入力処理を行!、、  The search method input unit, for a search statement including a thesaurus raster, includes a narrow-sense search method that uses the thesaurus star included in the search statement as a search condition, a thesaurus raster included in the search statement, and a thesaurus raster below the thesaurus raster included in the search statement Perform search method input processing to input the specification of one of the search methods with the broad search method as the search condition !,
検索文入力部がシソーラスタームを含む検索文を入力機器力 入力する検索文入 力処理を行い、  The search text input unit performs a search text input process in which a search text including a thesaurus raster is input.
文献情報取得部が、前記検索方式入力部の入力した検索方式での検索条件に従 い、前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ ースから取得する文献情報取得処理を行!、  A document in which a document information acquisition unit acquires document information corresponding to a search sentence input by the search sentence input unit from the reference database according to a search condition in the search method input by the search method input unit. Perform information acquisition processing !,
シソーラスターム抽出部が、前記文献情報取得部の取得した取得文献情報に含ま れるシソーラスタームを抽出するシソーラスターム抽出処理を行い、  The thesaurus raster extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit,
出力部が前記シソーラスターム抽出部の抽出したシソーラスタームを出力機器に出 力する出力処理を行う  The output unit performs output processing for outputting the thesaurus raster extracted by the thesaurus raster extracting unit to an output device.
ことを特徴とする文献検索方法。  A document retrieval method characterized by that.
[16] シソーラスタームを含む文献情報の集合を記憶する文献データベースとシソーラス タームの同義語を記憶する同義語データベースとに接続し、文献を検索する文献検 索装置の文献検索方法であり、 検索語入力部が複数の検索語を入力機器から入力する検索語入力処理を行い、 同義ターム取得部が前記検索語入力部の入力した複数の検索語それぞれを同義 語とするそれぞれのシソーラスタームを前記同義語データベース力 取得する同義タ ーム取得処理を行い、 [16] A document retrieval method for a document retrieval apparatus that retrieves documents by connecting to a document database that stores a set of document information including a thesaurus raster and a synonym database that stores synonyms of a thesaurus term. The search term input unit performs a search term input process in which a plurality of search terms are input from an input device, and the synonym term acquisition unit obtains each thesaurus raster having the plurality of search terms input by the search term input unit as synonyms. Perform synonym term acquisition processing to acquire the synonym database power,
指定部が前記同義ターム取得部の取得したシソーラスターム力 検索語とするシソ 一ラスタームを指定させる指定処理を行 、、  The designation unit performs a designation process for designating a single raster term as a search term acquired by the synonym term acquisition unit,
検索分入力部が前記指定部により指定されたシソーラスタームを含む検索文を入 力機器から入力する検索文入力処理を行 ヽ、  The search part input unit performs a search sentence input process in which a search sentence including the thesaurus raster specified by the specification part is input from the input device.
文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、  A document information acquisition unit performs document information acquisition processing for acquiring document information corresponding to the search sentence input by the search sentence input unit from the document database,
シソーラスターム抽出部が、前記文献情報取得部の取得した取得文献情報に含ま れるシソーラスタームを抽出するシソーラスターム抽出処理を行い、  The thesaurus raster extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit,
出力部が前記シソーラスターム抽出部の抽出したシソーラスタームを出力機器に出 力する出力処理を行う  The output unit performs output processing for outputting the thesaurus raster extracted by the thesaurus raster extracting unit to an output device.
ことを特徴とする文献検索方法。  A document retrieval method characterized by that.
[17] 請求項 9〜請求項 16いずれかに記載の文献検索方法をコンピュータに実行させる ことを特徴とする文献検索プログラム。  [17] A document search program that causes a computer to execute the document search method according to any one of claims 9 to 16.
[18] 文献情報の集合を記憶する文献データベースと各文献用語を特定の分類と対応 付けて記憶する分類データベースとに接続し、文献を検索する文献検索装置であり 検索文を入力機器から入力する検索文入力部と、 [18] A document retrieval device that retrieves documents by connecting a document database that stores a set of document information and a classification database that stores each document term in association with a specific classification, and inputs a search sentence from an input device. Search text input part,
前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、  A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;
前記文献情報取得部の取得した取得文献情報に含まれる文献用語を抽出する文 献用語抽出部と、  A document term extraction unit for extracting a document term included in the acquired document information acquired by the document information acquisition unit;
前記文献用語抽出部の抽出した文献用語それぞれについて、各文献用語の分類 を前記分類データベースから取得する分類取得部と、  For each document term extracted by the document term extraction unit, a classification acquisition unit that acquires a classification of each document term from the classification database;
前記文献用語抽出部の抽出した文献用語と前記分類取得部の取得した当該分類 とを対応付けて出力機器に出力する出力部と The literature term extracted by the literature term extraction unit and the classification acquired by the classification acquisition unit And an output unit that outputs to the output device in association with
を備えたことを特徴とする文献検索装置。  A document retrieval apparatus comprising:
[19] 文献情報の集合を記憶する文献データベースに接続し、文献を検索する文献検索 装置であり、  [19] A document retrieval device that connects to a document database that stores a set of document information and retrieves documents.
検索文を入力機器から入力する検索文入力部と、  A search text input unit for inputting a search text from an input device;
前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、  A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;
前記文献情報取得部の取得した取得文献情報に含まれる文献用語を抽出する文 献用語抽出部と、  A document term extraction unit for extracting a document term included in the acquired document information acquired by the document information acquisition unit;
前記文献用語抽出部の抽出した文献用語それぞれについて、文献情報の集合の 文献情報の数における当該文献用語が含まれる文献情報の数の割合に対する、取 得文献情報の数における当該文献用語が含まれる取得文献情報の数の割合のスコ ァを算出するタームスコア算出部と、  For each document term extracted by the document term extraction unit, the document term in the number of document information acquired is included in the ratio of the number of document information including the document term in the number of document information in the set of document information. A term score calculator for calculating a score of the ratio of the number of acquired document information;
前記文献用語抽出部の抽出した文献用語と前記タームスコア算出部の算出した当 該スコアとを対応付けて出力機器に出力する出力部と  An output unit that associates the document term extracted by the document term extraction unit with the score calculated by the term score calculation unit, and outputs it to an output device;
を備えたことを特徴とする文献検索装置。  A document retrieval apparatus comprising:
[20] 前記文献検索装置は、さらに、 [20] The literature search device further includes:
取得文献情報の数についての第 1の閾値と当該文献用語が含まれる取得文献情 報の数についての第 2の閾値とを記憶する記憶機器を備え、  A storage device for storing a first threshold for the number of acquired document information and a second threshold for the number of acquired document information including the document term;
前記タームスコア算出部は、  The term score calculator
前記記憶機器力 第 1の閾値と第 2の閾値とを取得し、第 1の閾値と第 2の閾値と取 得文献情報の数と当該文献用語が含まれる取得文献情報の数とに基づいて、取得 文献情報の数が第 1の閾値未満であり且つ当該文献用語が含まれる取得文献情報 の数が第 2の閾値未満である当該文献用語のスコアを低く算出する  The storage device power is obtained based on the first threshold value and the second threshold value, and based on the first threshold value, the second threshold value, the number of obtained document information, and the number of obtained document information including the document term. The score of the document term that has the number of acquired document information less than the first threshold and the number of acquired document information that includes the document term is less than the second threshold is calculated low.
ことを特徴とする請求項 19記載の文献検索装置。  20. The document retrieval apparatus according to claim 19, wherein
[21] 文献情報の集合を記憶する文献データベースに接続し、文献を検索する文献検索 装置であり、 [21] A document retrieval device that connects to a document database that stores a set of document information and retrieves documents.
検索文を入力機器から入力する検索文入力部と、 前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、 A search text input unit for inputting a search text from an input device; A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;
前記文献情報取得部の取得した取得文献情報に含まれる文献用語を抽出し記憶 機器に記憶する文献用語抽出部と、  A document term extraction unit that extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores it in a storage device;
前記文献用語抽出部が今回抽出した文献用語の内で前記文献用語抽出部が記 憶機器に記憶している過去に抽出した文献用語と同じ文献用語の数を算出する重 複用語数算出部と、  A duplicate term number calculating unit for calculating the number of the same document terms as the previously extracted document terms stored in the storage device by the document term extracting unit among the document terms extracted this time by the document term extracting unit; ,
前記重複用語数算出部の算出した文献用語の数を出力機器に出力する出力部と を備えたことを特徴とする文献検索装置。  The document retrieval apparatus comprising: an output unit that outputs the number of document terms calculated by the duplicate term number calculation unit to an output device.
[22] 文献情報の集合を記憶する文献データベースと各文献用語を特定の分類と対応 付けて記憶する分類データベースとに接続し、文献を検索する文献検索装置の文献 検索方法であり、  [22] A document retrieval method of a document retrieval apparatus for retrieving a document by connecting to a document database storing a set of document information and a classification database storing each document term in association with a specific classification.
検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、 文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、  Literature information in which the retrieval text input section performs retrieval text input processing for inputting a retrieval text from an input device, and the literature information acquisition section obtains literature information corresponding to the retrieval text input by the retrieval text input section from the literature database. Perform the acquisition process,
文献用語抽出部が、前記文献情報取得部の取得した取得文献情報に含まれる文 献用語を抽出する文献用語抽出処理を行い、  A document term extraction unit performs a document term extraction process for extracting a document term included in the acquired document information acquired by the document information acquisition unit,
分類取得部が前記文献用語抽出部の抽出した文献用語それぞれについて、各文 献用語の分類を前記分類データベース力 取得する分類取得処理を行い、 出力部が前記文献用語抽出部の抽出した文献用語と前記分類取得部の取得した 当該分類とを対応付けて出力機器に出力する出力処理を行う  For each document term extracted by the document term extraction unit, the classification acquisition unit performs a classification acquisition process for acquiring the classification database power of the classification of each document term, and the output unit and the document term extracted by the document term extraction unit An output process is performed in which the classification acquired by the classification acquisition unit is associated with the classification and output to the output device.
ことを特徴とする文献検索方法。  A document retrieval method characterized by that.
[23] 文献情報の集合を記憶する文献データベースに接続し、文献を検索する文献検索 装置の文献検索方法であり、 [23] A literature retrieval method for a literature retrieval apparatus that connects to a literature database that stores a set of literature information and retrieves literature.
検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、 文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、  Literature information in which the retrieval text input section performs retrieval text input processing for inputting a retrieval text from an input device, and the literature information acquisition section obtains literature information corresponding to the retrieval text input by the retrieval text input section from the literature database. Perform the acquisition process,
文献用語抽出部が、前記文献情報取得部の取得した取得文献情報に含まれる文 献用語を抽出する文献用語抽出処理を行い、 A sentence included in the acquired document information acquired by the document information acquiring unit by the document term extracting unit Perform literature term extraction processing to extract contribution terms,
タームスコア算出部力 S、前記文献用語抽出部の抽出した文献用語それぞれについ て、文献情報の集合の文献情報の数における当該文献用語が含まれる文献情報の 数の割合に対する、取得文献情報の数における当該文献用語が含まれる取得文献 情報の数の割合のスコアを算出するタームスコア算出処理を行 、、  For each document term extracted by the term score calculation unit S and the document term extraction unit, the number of acquired document information with respect to the ratio of the number of document information including the document term in the number of document information in the set of document information Perform a term score calculation process to calculate the score of the ratio of the number of acquired document information including the document term in
出力部が前記文献用語抽出部の抽出した文献用語と前記タームスコア算出部の 算出した当該スコアとを対応付けて出力機器に出力する出力処理を行う  The output unit performs an output process in which the document term extracted by the document term extraction unit and the score calculated by the term score calculation unit are associated with each other and output to an output device.
ことを特徴とする文献検索方法。  A document retrieval method characterized by that.
[24] 取得文献情報の数についての第 1の閾値と当該文献用語が含まれる文献情報の 数についての第 2の閾値とを記憶する記憶機器をさらに備える前記文献検索装置の 文献検索方法であり、 [24] The document search method of the document search apparatus, further comprising a storage device that stores a first threshold for the number of acquired document information and a second threshold for the number of document information including the document term. ,
前記タームスコア算出処理において、  In the term score calculation process,
前記タームスコア算出部が、前記記憶機器力 第 1の閾値と第 2の閾値とを取得し 、第 1の閾値と第 2の閾値と取得文献情報の数と当該文献用語が含まれる取得文献 情報の数とに基づいて、取得文献情報の数が第 1の閾値未満であり且つ当該文献 用語が含まれる取得文献情報の数が第 2の閾値未満である文献用語のスコアを低く 算出する  The term score calculation unit acquires the first threshold value and the second threshold value of the storage device power, and acquires the first threshold value, the second threshold value, the number of acquired document information, and the acquired document information including the document term Based on the number of documents, the score of the document terms in which the number of acquired document information is less than the first threshold and the number of acquired document information including the document term is less than the second threshold is calculated low.
ことを特徴とする請求項 23記載の文献検索方法。  24. The document retrieval method according to claim 23.
[25] 文献情報の集合を記憶する文献データベースに接続し、文献を検索する文献検索 装置の文献検索方法であり、 [25] A document retrieval method for a document retrieval device that connects to a document database that stores a set of document information and retrieves documents.
検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、 文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、  Literature information in which the retrieval text input section performs retrieval text input processing for inputting a retrieval text from an input device, and the literature information acquisition section obtains literature information corresponding to the retrieval text input by the retrieval text input section from the literature database. Perform the acquisition process,
文献用語抽出部が、前記文献情報取得部の取得した取得文献情報に含まれる文 献用語を抽出し記憶機器に記憶する文献用語抽出処理を行い、  A document term extraction unit performs a document term extraction process that extracts a document term included in the acquired document information acquired by the document information acquisition unit and stores it in a storage device,
重複用語数算出部が前記文献用語抽出部が今回抽出した文献用語の内で前記 文献用語抽出部が記憶機器に記憶している過去に抽出した文献用語と同じ文献用 語の数を算出する重複用語数算出処理を行い、 出力部が、前記重複用語数算出部の算出した文献用語の数を出力機器に出力す る出力処理を行う Duplicate term count calculation unit calculates the number of the same reference terms as previously extracted reference terms stored in the storage device by the reference term extraction unit among the reference terms currently extracted by the reference term extraction unit Term number calculation processing, The output unit performs an output process of outputting the number of document terms calculated by the duplicate term number calculation unit to an output device.
ことを特徴とする文献検索方法。  A document retrieval method characterized by that.
[26] 請求項 22〜請求項 25いずれかに記載の文献検索方法をコンピュータに実行させ ることを特徴とする文献検索プログラム。  [26] A document search program that causes a computer to execute the document search method according to any one of claims 22 to 25.
[27] シソーラスタームを含む文献情報の集合を記憶する文献データベースと各シソーラ スタームを遺伝子と疾患と化合物と組織との分類に対応付けて記憶する分類データ ベースとに接続し、文献を検索する文献検索装置であり、 [27] A document database that stores a collection of document information including a thesaurus raster, and a classification database that stores each thesaurus solar in association with the classification of genes, diseases, compounds, and tissues, and retrieves the documents. A search device,
検索文を入力機器から入力する検索文入力部と、  A search text input unit for inputting a search text from an input device;
前記検索文入力部の入力した検索文に対応する文献情報を前記文献データべ一 スから取得する文献情報取得部と、  A document information acquisition unit that acquires document information corresponding to the search statement input by the search statement input unit from the document database;
前記文献情報取得部の取得した取得文献情報に含まれるシソーラスタームを抽出 するシソーラスターム抽出部と、  A thesaurus raster extracting unit for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit;
前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれについて、各シ ソーラスタームの分類を前記分類データベースから取得する分類取得部と、 前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれについて、文献 情報の集合の文献情報の数における当該シソーラスタームが含まれる文献情報の数 の割合に対する、取得文献情報の数における当該シソーラスタームが含まれる取得 文献情報の数の割合のスコアを算出するタームスコア算出部と、  For each thesaurus raster extracted by the thesaurus raster extraction unit, a classification acquisition unit that acquires the classification of each thesaurus raster from the classification database; and for each thesaurus raster extracted by the thesaurus raster extraction unit A term score calculation unit that calculates a score of a ratio of the number of acquired document information including the thesaurus raster in the number of acquired document information to a ratio of the number of document information including the thesaurus raster in the number of information;
前記シソーラスターム抽出部の抽出したシソーラスタームと共に、前記分類取得部 の取得した当該分類と前記タームスコア算出部の算出した当該スコアとを、分子、疾 患などの関連に関わる情報として出力機器に出力する出力部と  Along with the thesaurus raster extracted by the thesaurus raster extraction unit, the classification acquired by the classification acquisition unit and the score calculated by the term score calculation unit are output to the output device as information related to the relationship between molecules, diseases, etc. Output part to
を備えたことを特徴とする文献検索装置。  A document retrieval apparatus comprising:
[28] シソーラスタームを含む文献情報の集合を記憶する文献データベースと各シソーラ スタームを遺伝子と疾患と化合物と組織との分類に対応付けて記憶する分類データ ベースとに接続し、文献を検索する文献検索装置の文献検索方法であり、 [28] A document that stores a collection of document information including a thesaurus raster, and a database that searches each document by connecting each thesaurus solar to a classification database that stores them in association with the classification of genes, diseases, compounds, and tissues. A literature search method for a search device,
検索文入力部が検索文を入力機器から入力する検索文入力処理を行い、 文献情報取得部が、前記検索文入力部の入力した検索文に対応する文献情報を 前記文献データベースから取得する文献情報取得処理を行い、 The search text input unit performs a search text input process for inputting the search text from the input device, and the literature information acquisition unit stores the literature information corresponding to the search text input by the search text input unit. Perform literature information acquisition processing to acquire from the literature database,
シソーラスターム抽出部が、前記文献情報取得部の取得した取得文献情報に含ま れるシソーラスタームを抽出するシソーラスターム抽出処理を行い、  The thesaurus raster extracting unit performs a thesaurus raster extracting process for extracting a thesaurus raster included in the acquired document information acquired by the document information acquiring unit,
分類取得部力 前記シソーラスターム抽出部の抽出したシソーラスタームそれぞれ につ 、て、各シソーラスタームの分類を前記分類データベース力 取得する分類取 得処理を行い、  Classification acquisition unit power For each of the thesaurus rasters extracted by the thesaurus raster extraction unit, a classification acquisition process for acquiring the classification database power of the classification of each thesaurus raster is performed.
タームスコア算出部力 S、前記シソーラスターム抽出部の抽出したシソーラスタームそ れぞれについて、文献情報の集合の文献情報の数における当該シソーラスタームが 含まれる文献情報の数の割合に対する、取得文献情報の数における当該シソーラス タームが含まれる取得文献情報の数の割合のスコアを算出するタームスコア算出処 理を行い、  For each of the thesaurus rasters extracted by the term score calculation unit S and the thesaurus raster extraction unit, the acquired document information with respect to the ratio of the number of document information including the thesaurus raster in the number of document information in the set of document information The term score calculation process is performed to calculate the score of the ratio of the number of acquired document information including the thesaurus term in the number of
出力部が、前記シソーラスターム抽出部の抽出したシソーラスタームと共に、前記 分類取得部の取得した当該分類と前記タームスコア算出部の算出した当該スコアと を、分子、疾患などの関連に関わる情報として出力機器に出力する出力処理を行う ことを特徴とする文献検索方法。  The output unit outputs the classification obtained by the classification acquisition unit and the score calculated by the term score calculation unit together with the thesaurus raster extracted by the thesaurus raster extraction unit as information related to the relationship between molecules, diseases, and the like. A document retrieval method characterized by performing an output process for outputting to a device.
請求項 28記載の文献検索方法をコンピュータに実行させることを特徴とする文献 検索プログラム。  A document search program for causing a computer to execute the document search method according to claim 28.
PCT/JP2005/021652 2005-11-25 2005-11-25 Document retrieval device, method, and program WO2007060726A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2005/021652 WO2007060726A1 (en) 2005-11-25 2005-11-25 Document retrieval device, method, and program
JP2006515526A JP4398461B2 (en) 2005-11-25 2005-11-25 Literature retrieval device, literature retrieval method, and literature retrieval program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2005/021652 WO2007060726A1 (en) 2005-11-25 2005-11-25 Document retrieval device, method, and program

Publications (1)

Publication Number Publication Date
WO2007060726A1 true WO2007060726A1 (en) 2007-05-31

Family

ID=38066968

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2005/021652 WO2007060726A1 (en) 2005-11-25 2005-11-25 Document retrieval device, method, and program

Country Status (2)

Country Link
JP (1) JP4398461B2 (en)
WO (1) WO2007060726A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2020250812A1 (en) * 2019-06-10 2020-12-17
CN115455272A (en) * 2022-09-13 2022-12-09 共青科技职业学院 Page data processing method and device, electronic equipment and storage medium
CN116126997B (en) * 2023-04-04 2023-06-13 北京洞悉网络有限公司 Document deduplication storage method, system, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0528199A (en) * 1991-07-18 1993-02-05 Canon Inc Document retrieval device
JP2003036261A (en) * 2001-07-26 2003-02-07 Kyodo News Service Device, program, and method for automatically classifying sentence, and computer readable recording medium recorded with automated sentence classifying program
JP2005128872A (en) * 2003-10-24 2005-05-19 Toshiba Solutions Corp Document retrieving system and document retrieving program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0528199A (en) * 1991-07-18 1993-02-05 Canon Inc Document retrieval device
JP2003036261A (en) * 2001-07-26 2003-02-07 Kyodo News Service Device, program, and method for automatically classifying sentence, and computer readable recording medium recorded with automated sentence classifying program
JP2005128872A (en) * 2003-10-24 2005-05-19 Toshiba Solutions Corp Document retrieving system and document retrieving program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NIWA Y.: "Dynamic co-occurrence analysis for interactive document retrieval.", 13 March 1996 (1996-03-13), pages 41 - 48, XP002995568 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2020250812A1 (en) * 2019-06-10 2020-12-17
WO2020250812A1 (en) * 2019-06-10 2020-12-17 株式会社島津製作所 Document information providing method and program
CN114270450A (en) * 2019-06-10 2022-04-01 株式会社岛津制作所 Document information providing method and program
CN115455272A (en) * 2022-09-13 2022-12-09 共青科技职业学院 Page data processing method and device, electronic equipment and storage medium
CN116126997B (en) * 2023-04-04 2023-06-13 北京洞悉网络有限公司 Document deduplication storage method, system, device and storage medium

Also Published As

Publication number Publication date
JPWO2007060726A1 (en) 2009-05-07
JP4398461B2 (en) 2010-01-13

Similar Documents

Publication Publication Date Title
Mazandu et al. Gene ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery
Lu PubMed and beyond: a survey of web tools for searching biomedical literature
US8280878B2 (en) Method and apparatus for real time text analysis and text navigation
US20100306213A1 (en) Merging Search Results
US7340460B1 (en) Vector analysis of histograms for units of a concept network in search query processing
US20090187548A1 (en) System and method for automatically classifying search results
US20040186828A1 (en) Systems and methods for enabling a user to find information of interest to the user
US20110320433A1 (en) Automated Joining of Disparate Data for Database Queries
US9824088B2 (en) Active knowledge guidance based on deep document analysis
US20120078979A1 (en) Method for advanced patent search and analysis
Arnaboldi et al. Wormicloud: a new text summarization tool based on word clouds to explore the C. elegans literature
JP5221664B2 (en) Information map management system and information map management method
US20050171931A1 (en) Database searching method and system
TWI290687B (en) System and method for search information based on classifications of synonymous words
JP2014010758A (en) File management device, file management method, and program
JP4398461B2 (en) Literature retrieval device, literature retrieval method, and literature retrieval program
Hu et al. Embracing information explosion without choking: Clustering and labeling in microblogging
JP4886014B2 (en) Literature retrieval device, literature retrieval method, and literature retrieval program
KR20140135100A (en) Method for providing program using semantic mashup technology
JP5757187B2 (en) File storage location candidate determination device, file storage location candidate determination method, file storage location determination support system, and computer program
Marygold et al. Exploring FlyBase Data Using QuickSearch
Hu et al. Ontology-based scalable and portable information extraction system to extract biological knowledge from huge collection of biomedical web documents
JPWO2007060727A1 (en) Literature retrieval apparatus, method and program
JP2009294768A (en) Information sharing device and information sharing program
Schedl et al. Automatically detecting members and instrumentation of music bands via web content mining

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2006515526

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05809405

Country of ref document: EP

Kind code of ref document: A1