WO2007105530A1 - Related term acquiring device, related term acquiring method, technology trend information creating device, technology trend information creating method, and program - Google Patents

Related term acquiring device, related term acquiring method, technology trend information creating device, technology trend information creating method, and program Download PDF

Info

Publication number
WO2007105530A1
WO2007105530A1 PCT/JP2007/054257 JP2007054257W WO2007105530A1 WO 2007105530 A1 WO2007105530 A1 WO 2007105530A1 JP 2007054257 W JP2007054257 W JP 2007054257W WO 2007105530 A1 WO2007105530 A1 WO 2007105530A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
document
term
technical
unit
Prior art date
Application number
PCT/JP2007/054257
Other languages
French (fr)
Japanese (ja)
Inventor
Hidetsugu Nanba
Original Assignee
Hiroshima City
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hiroshima City filed Critical Hiroshima City
Priority to JP2008505065A priority Critical patent/JP5078164B2/en
Publication of WO2007105530A1 publication Critical patent/WO2007105530A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users

Definitions

  • the present invention creates, for example, a related term acquisition device that acquires a related term related to a certain term, and technical trend information that is information that associates a technical term with a period related to a technical document including the technical term.
  • the present invention relates to a technology trend information creation device and the like.
  • a conventional information processing apparatus searches a Web page on the Web and automatically collects related terms (for example, see Non-Patent Document 1, Non-Patent Document 2, Non-Patent Document 3, and Non-Patent Document 4). .
  • Non-Patent Document 1 In general, in order to collect terms related to a technical term t from the Web, it is necessary to first collect descriptions related to the term t and then extract terms related to t from there.
  • one of the points is to collect appropriate descriptions about the term t. For example, in Non-Patent Document 1, descriptions relating to the term t are collected by the method described below.
  • t For a certain term t, first enter four types of queries, “What is t”, “T”, “t is”, and “t”, into the search engine, and the top 100 pages of the obtained URLs. Get Next, the obtained page is formatted and divided into sentences, and only sentences containing the term t are extracted, and related terms are collected from there.
  • Non-Patent Document 5 there is a survey paper creation support system that takes into account reference information between papers in the prior art.
  • Non-Patent Documents 1, 3, 4, and 6 In recent years, research has been actively conducted to automatically collect terms related to a certain term from a text set (see Non-Patent Documents 1, 3, 4, and 6).
  • Non-patent document 1 Satoshi Sato, 1 other, "Automatic collection of related terms using the web", Information Processing Society of Japan, Natural Language Processing, NL-153, p. 57-64, 2003
  • Non-Patent Document 2 Yasuhiro Sasaki, two others, “Proposal of an index for measuring the degree of association between terms”, 10th Annual Conference of the Association for Natural Language Processing, p. 25-28, 2004
  • Non-Patent Document 3 Kiyoaki Shirai and 3 others, "Attempt to create portal site automatically", The Association for Language Processing 10th Annual Conference, p. 624—627, 2004
  • Non-Patent Literature 4 Keisuke Ohara, 3 others, “Gathering Related Terms Using the Web”, 3rd Information Science Technology Forum (FIT2004), 2004
  • Non-Patent Document 5 Eizo Namba and 1 other, “Development of survey paper creation support system considering reference information between papers”, Natural Language Processing, Vol. 6, No. 5, p. 43-62, 1999
  • Non-Patent Document 6 Ei Namba, “Automatic Collection of Related Terms Using Citation Information Between Papers”, The 11th Annual Conference of the Language Processing Society of Japan, 2005
  • the present invention has been made to solve at least one of the above problems, and a related term acquisition device or the like capable of acquiring appropriate related terms by using a citation relationship between documents is provided.
  • the purpose is to provide.
  • the present invention has been made to solve at least one of the above-described problems, and creates technical trend information, which is information for associating a technical term with a period related to a technical document including the technical term.
  • the purpose is to provide a technology trend information creation device, etc. that can do this.
  • a related term acquisition device includes a reference information storage unit that stores two or more reference information that is information indicating a reference, and information indicating a term.
  • a term information receiving unit that accepts certain term information, and document information having the term indicated by the term information accepted by the term information receiving unit in all or part of the document information is stored in the document information storage unit !, Search from two or more document information, and identify all the document information searched or a part of the document information searched, and other documents cited in the document information searched.
  • a document information acquisition unit that acquires from the document information storage unit a portion having at least the document identification information for bow I that is information, and a bow included in all or part of the document information acquired by the document information acquisition unit Document identification information for I
  • the document information for bow I which is information indicating the document identified by (2), is searched for two or more document information stored in the document information storage unit, and is a part of the retrieved cited document information.
  • a citation information acquisition unit that acquires a part corresponding to a predetermined item from the document information storage unit, and a part of the citation information acquired by the citation information acquisition unit, the term information reception unit
  • Related term information acquisition unit that acquires related term information that is information indicating related terms related to the term indicated by the term information received by the term information, and related term information that outputs the related term information acquired by the related term information acquisition unit
  • an output unit that outputs the related term information acquired by the related term information acquisition unit
  • the cited document information acquisition unit acquires a part of the cited document information, more appropriate related term information can be acquired.
  • the document information includes structure description information that is information describing a structure of the document information
  • the cited document information acquisition unit acquires the document information acquisition. Is a part of cited document information indicating a cited document identified by the document identification information for bow I included in all or a part of the document information acquired by the department, and is determined in advance by the structure description information. You may retrieve the part associated with the associated item.
  • the cited document information is acquired.
  • the department can easily obtain the part corresponding to the item that has been determined in advance.
  • the cited document structure information acquisition is performed for acquiring the cited document structure information which is information indicating correspondence between one or more parts of the cited document information and items of the part.
  • the cited document information acquisition unit is a part of the cited document information and is preliminarily determined using the cited document structure information acquired by the cited document structure information acquisition unit. You may acquire the part corresponding to an item.
  • the cited document information acquisition unit can easily acquire a portion corresponding to a predetermined item using the cited document structure information acquired by the cited document structure information acquisition unit. It can be done.
  • the document information acquisition unit when the document information acquisition unit acquires a part of the document information, the document information acquisition unit includes at least the cited document identification information, The cited reference information acquisition unit is included in all or part of the reference information acquired by the reference information acquisition unit.
  • the cited reference information acquisition unit is included in all or part of the reference information acquired by the reference information acquisition unit.
  • the information for bow I included in all or part of the document information acquired by the document information acquisition unit which is a part of the cited document information indicating the document identified by the document identification information for bow I You can get the part corresponding to the item that is predetermined according to the citation format!
  • the cited document information acquisition unit can acquire a portion corresponding to the item according to the citation format, and one of the cited document information corresponding to which item is determined according to the citation format. Appropriate setting of whether or not to acquire the part can enable more relevant terminology information to be acquired.
  • the related term information acquisition unit is a related term that is information that is a candidate for a related term in a part of the cited document information acquired by the cited reference information acquisition unit.
  • Related term candidate information acquiring means for acquiring candidate information
  • relevance level calculating means for calculating, for each related term candidate information, a relevance level of terms indicated by the related term candidate information acquired by the related term candidate information acquiring means
  • Related term information determining means for determining information as related term information.
  • related term information can be appropriately acquired from a part of the cited document information acquired by the cited document information acquiring unit.
  • the document information storage unit stores at least a first type of document information and a second type of document information
  • the document information acquisition unit acquires all or part of the first type of document information from the first type of document information stored in the document information storage unit
  • the cited document information acquisition unit includes the document information storage unit.
  • a part of the cited document information may be acquired from the second type of document information stored in the above.
  • a technical trend information creation device includes a plurality of document information that is information indicating at least a part of a technical document, and a technical document indicated by the document information.
  • a document information storage unit that stores at least timing information, which is information indicating a period of time, and is stored in the document information storage unit! All or one of at least some of the document information among the plurality of document information.
  • a document information acquisition unit that acquires a part, a part or all of the document information acquired by the document information acquisition unit, a technical term extraction unit that extracts a technical term, and a technical term that the technical term extraction unit extracts
  • the time information acquisition unit that acquires the time information corresponding to the document information stored from the document information storage unit, the technical terms extracted by the technical term extraction unit, and the time information acquired by the time information acquisition unit
  • the technical trend information creation unit that creates technical trend information, which is information for associating technical terms with the timing of the technical literature including the technical terms, and the technical trend information created by the technical trend information creation unit. And an output unit for outputting.
  • a search term receiving unit that receives a search term and a search unit that searches the document information storage unit using the search term received by the search term receiving unit.
  • the document information acquisition unit may acquire all or part of the retrieved document information.
  • any one of a search term storage unit storing two or more search terms and two or more search terms stored in the search term storage unit A search term selection accepting unit that accepts selection of the search term, and the search term accepting unit may accept the search term indicated by the selection accepted by the search term selection accepting unit.
  • an appropriate search term for example, a term indicating a research field
  • an appropriate search term storage unit so that an inappropriate search term is input. It can be avoided.
  • an appropriate search can be performed and appropriate technical trend information can be created.
  • a technical term extraction unit that extracts technical information terminology stored in the literature information storage unit, and a suffix storage that stores a predetermined suffix And the technical term extracted by the technical term extraction unit is stored in the suffix storage unit, and the term with the suffix is included in the literature information stored in the literature information storage unit.
  • a search term storage unit that stores the technical terms as search terms in the search term storage unit may be further included.
  • the search terms can be automatically stored in the search term storage unit without the user performing processing for storing the search terms in the search term storage unit.
  • the search unit searches for document information corresponding to a technical document whose search term received by the search term receiving unit is a main topic.
  • the technical trend information creation device further includes a specifying unit that specifies document information having the same field as the document information searched by the searching unit, and the document information acquiring unit includes the specifying unit All or a part of the document information specified by may be acquired from the document information storage unit.
  • the specifying unit is information that identifies another technical document cited in the document information from the document information searched by the search unit.
  • a certain cited document identification information may be acquired, and using the cited document identification information, the document information having the same field as the document information searched by the search unit may be specified.
  • the specifying unit describes a description of another technical document cited in the document information from the document information searched by the search unit.
  • the citation information which is the information to be shown, is also acquired, and the citation format indicated by the citation information is cited.
  • Reference information having the same field as the document information searched by the search unit may be specified using cited document identification information for identifying the technical document being searched.
  • the technical term extraction unit includes at least one A clue phrase storage means for storing a clue word phrase, a manual power S phrase search means for searching for a clue phrase in all or part of the document information acquired by the document information acquisition unit,
  • Technical term extraction means may be provided that extracts, as technical terms, terms that have a predetermined relationship with the hand-strengthening phrase searched by the hand-strengthening phrase search means. With such a configuration, technical terms can be extracted from all or a part of the acquired document information using the clue words.
  • the technical term extraction unit includes clue word storage means for storing one or more clue words, and all or all of the document information acquired by the document information acquisition unit.
  • the hand S-phrase searching means for searching for a powerful phrase
  • the search term received by the search term receiving unit and the hand searched by the clue phrase searching means by a powerful phrase
  • Technical term extraction means for extracting the associated term as a technical term.
  • the technical term extraction unit includes clue word storage means for storing one or more clue words, and all or all of the document information acquired by the document information acquisition unit.
  • the hand searches for a powerful phrase, a power S phrase search means, a technical term storage means for storing two or more technical terms, and a hand searched by the clue phrase search means.
  • Technical terms extraction means for extracting the technical terms as technical terms when the technical terms stored in the technical term storage means are included in the terms having a predetermined relationship with the phrase. .
  • the technical term extraction unit includes clue word storage means for storing one or more clue words, and all or all of the document information acquired by the document information acquisition unit.
  • the technical term is a term that has a predetermined relationship with the hand-powered phrase search means for searching for a powerful phrase by the hand and the hand searched by the hand-held phrase searching means.
  • Technical term candidate extraction means for extracting as a candidate, and for the technical Technical term selection means for selecting all or part of the technical term candidates extracted by the word candidate extraction means as technical terms.
  • a technical term candidate is extracted by using a hand-held phrase, and thereafter, an appropriate technical term can be selected for the technical term candidate power.
  • the technical term extraction unit further includes technical term suffix storage means for storing one or more suffixes that the technical term can take.
  • the technical term selection means may select a technical term candidate having a suffix stored in the technical term suffix storage means as a technical term.
  • a term having a predetermined suffix is considered appropriate as a technical term.
  • an appropriate technical term is selected. Can be selected.
  • the technical term selection means is stored in the literature information storage unit, and is used as a main topic according to any literature information. You can select technical term candidates as technical terms!
  • the technical term extraction unit includes literature information specifying means for specifying the literature information in which the technical field candidates are extracted, and literature information having the same field.
  • a technical term suffix storage unit that stores one or more suffixes that can be taken by the technical term
  • the literature information acquisition unit also includes all or part of the literature information identified by the literature information identification unit.
  • the technical term selecting means identifies the whole or a part of the one piece of literature information obtained by the literature information obtaining unit and the field information and the field are the same as the one piece of literature information.
  • the suffix storage means When at least one of technical term candidates extracted from all or a part of the document information is stored in the suffix storage means and has a suffix, all or one of the one document information is stored. Part The extracted technical terms candidate be selected as the technical terms! / ⁇ . With such a configuration, it is possible to select an appropriate technical term.
  • the technical trend information is an image indicating a timing indicated by timing information corresponding to document information including the technical term for each technical term
  • the output unit May display and output the technical trend information.
  • the technical trend information includes, for each technical term, a reference graphic corresponding to the reference information including the technical term corresponding to the reference information. It is an image that is displayed according to the time indicated by the time information, and relates to a document selection accepting unit that accepts selection of a document graphic in the technology trend information, and document information corresponding to the document graphic that the graphic selection accepting unit has accepted the selection.
  • a display information acquisition unit that acquires from the document information storage unit display information that is information and information to be displayed on an image of the technical trend information.
  • the technical trend information creation unit acquires the display information.
  • Technical trend information for displaying the display information acquired by the department may be created.
  • the user who is viewing the technical trend information image can easily know the predetermined information about the document information including the technical term displayed in the technical trend information image. It will be possible to improve user convenience.
  • related term information related to term information can be acquired.
  • the technical trend information creating apparatus and the like it is possible to create technical trend information that is information for associating technical terms with times related to technical documents including the technical terms. Therefore, by using the technology trend information, it is possible to easily know the changes in technology used in a certain research field.
  • FIG. 1 is a block diagram showing a configuration of a related term acquisition device 1 according to this embodiment.
  • a related term acquisition device 1 includes a document information storage unit 11, a term information reception unit 12, a document information acquisition unit 13, a cited document information acquisition unit 14, and a related term information acquisition unit. 15 and a related term information output unit 16.
  • the document information storage unit 11 stores two or more document information.
  • the document information is information indicating a document.
  • the document may be any document as long as it is a collection of articles, patent documents (for example, application publications, patent bulletins, etc.), Web pages, official gazettes, books, etc.
  • One document information includes information (data) of the corresponding one document.
  • one piece of document information may only include information in a range necessary for later processing.
  • the document information stored in the document information storage unit 11 includes only the reference document and title. It may be information.
  • the bibliographic information may be a text file or a file of a unique format such as word processing software.
  • the text file may be described in a markup language that uses tags to describe the document.
  • the content of the corresponding document is simply text data. May be shown.
  • bibliographic information shall have a structure at least in part.
  • the structure may or may not be specified by a tag or a heading. In the latter case, for example, by reading a paragraph in the literature indicated by the literature information, you can see that the summary is written there, and by reading the other paragraph, the purpose is written there. It may be understood that it is.
  • the structure is information that the document information has, and information that describes the structure of the document information. It may or may not be indicated by the structure description information.
  • the structure description information is, for example, a heading tag used in a Japanese patent specification or a tag used in an HTML file.
  • the text data of the heading such as “summary” or “reference document” may be included in the document information as in a normal paper. is there.
  • the two or more document information stored in the document information storage unit 11 includes at least document information indicating a document that cites another document. Other documents cited in a certain document information are called cited documents.
  • Cited document identification information which is information for identifying a cited document, is included in the document information indicating a document cited by citing another document.
  • Cited reference identification information may be, for example, an author's name, title, date of publication, which may be an ID for identifying a cited reference such as a patent number, an application publication number of a patent application, or a PMID (Pub Med Identifier). It may be bibliographic information of a document that can identify a cited document such as information on a medium (journal name, publisher name, URL, etc.).
  • the document information storage unit 11 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.). The process in which the user attribute correspondence information is stored in the document information storage unit 11 does not matter.
  • document information transmitted via a communication line or the like may be stored in the document information storage unit 11 even if document information is stored in the document information storage unit 11 via a recording medium.
  • the document information input via the input device may be stored in the document information storage unit 11.
  • the term information receiving unit 12 receives term information.
  • term information is information indicating a term. This term information is generally information indicating the text of the term, but may be other information as long as it indicates the term.
  • the term information receiving unit 12 may receive information input from an input device (for example, a keyboard, a mouse, a touch panel, etc.) or may be transmitted via a wired or wireless communication line. Accepts information read from a given recording medium (for example, optical disk, magnetic disk, semiconductor memory, etc.) that can receive information. May be.
  • the term information receiving unit 12 may or may not include a device for receiving (for example, a modem or a network card). Further, the term information receiving unit 12 may be realized by hardware or may be realized by software such as a driver that drives a predetermined device.
  • the term information receiving unit 12 may receive information other than the term information. For example, when a search is performed using related term information acquired by the related term acquisition device 1, a search key used in the search (for example, an international patent classification used in the search) may be received together with the term information. . Even in such a case, the following processing such as acquisition of document information is performed using the term information received by the term information receiving unit 12.
  • the document information acquisition unit 13 has two or more documents stored in the document information storage unit 11 having document information having the terms indicated by the term information received by the term information receiving unit 12 in all or part of the document information. Search from information.
  • the document information having the term in all the document information means that the term is included in any part of the document information.
  • bibliographic information having a term as part of the document information includes the term in a partial area of the bibliographic information, for example, a predetermined area such as a title or a claim. Let's say.
  • the document information acquisition unit 13 searches for document information having the term indicated by the term information as part of the document information, for example, information indicating an item corresponding to a part of the document information is stored in a recording medium (not shown).
  • the document information acquisition unit 13 that has been stored may read the information indicating the item of the recording medium power and perform the above search.
  • the document information acquisition unit 13 acquires all of the searched document information or a part of the searched document information from the document information storage unit 11.
  • the part of the document information is information that identifies other documents cited in the retrieved document information. It is the part that has at least the document identification information for Bow I. Therefore, all or a part of the document information including at least the cited document identification information is acquired by the document information acquisition unit 13.
  • the document information acquisition unit 13 may include a reference column or a patent that is considered to include at least reference text identification information. You can get the column of literature and non-patent literature, or use technology such as no-turn matching.
  • the document information acquiring unit 13 receives information indicating an item from the recording medium.
  • a part of the literature information corresponding to the item may be acquired by reading.
  • information indicating a pattern used for pattern matching or the like is stored in a recording medium (not shown), and the document information acquisition unit 13 reads the pattern and the like to read the pattern.
  • a part of the document information may be acquired by performing matching or the like.
  • the document information acquisition unit 13 searches for document information having a term indicated by the term information as part of the document information, and the document information acquisition unit 13 acquires a part of the searched document information
  • the part of the document information used in the search and the part of the document information to be acquired may be the same part of the document information or different parts of the document information.
  • the cited document information acquisition unit 14 retrieves the cited document information from two or more document information stored in the document information storage unit 11.
  • the cited document information is document information indicating the document identified by the document identification information for bow I included in all or part of the document information acquired by the document information acquiring unit 13. Then, the cited document information acquisition unit 14 acquires a part of the retrieved cited document information from the document information storage unit 11.
  • a part of the cited document information acquired by the cited document information acquiring unit 14 is a part of the cited document information corresponding to the item that has been determined in advance.
  • the part that corresponds to the item that has been determined is the part that describes the item.
  • An item is an area that can be classified in a document.
  • first claim a title, a title of an invention, an abstract (abstract), a claim, a claim, a premise part (preamble) of a claim, which is described first.
  • Claims hereinafter referred to as “first claim”
  • claims subordinate to the first claim the object of the invention, the effects of the invention, and the like.
  • the predetermined items may be a single item or two or more items.
  • the item may or may not be manifested in the cited reference information.
  • the fact that an item is manifested means that the part corresponding to that item is clearly indicated in the cited document information.
  • the item name may be displayed as text and the item may be actualized, or the tag may be used. In other words, the item may be manifested.
  • the cited document information acquisition unit 14 can know what items are described in a predetermined part of the document.
  • the cited document information obtaining unit 14 obtains a portion of the cited document information corresponding to a predetermined item.
  • information indicating items that have been determined in advance is stored in a recording medium (not shown) that can be accessed by the cited document information acquiring unit 14, the cited document information acquiring unit 14 Access to and read the information indicating the item. Then, when the item is actualized in the cited document information, the cited document information acquisition unit 14 acquires a portion of the cited document information corresponding to the read item.
  • the cited document information acquiring unit 14 specifies the cited document information part corresponding to the item, and acquires the specified cited document information part.
  • specification of the portion of the cited document information corresponding to the item will be described.
  • the cited document information acquisition unit 14 stores the condition information indicating the condition corresponding to each item to be specified! On the recording medium.
  • the cited document information acquiring unit 14 acquires the condition information corresponding to the item, and the reference information of the bow I document information corresponding to the condition information is acquired. Identify the part.
  • the cited document information acquisition unit 14 identifies the document identification for the bow I included in all or part of the document information acquired by the document information acquisition unit 13.
  • a part of cited document information indicating a cited document identified by information, which is associated with an item that has been determined in advance, may be acquired by structure description information.
  • the cited document information acquiring unit 14 can acquire a part of the cited document information as described above as a result, the process is not limited. For example, when the document information acquisition unit 13 acquires information including the title of the cited document as the cited document identification information, and the cited document information acquisition unit 14 acquires a part of the cited document information corresponding to the item “title” In The cited document information acquiring unit 14 may acquire a part of the cited document information by acquiring a part corresponding to the title of the cited document information from the cited document identification information acquired by the document information acquiring unit 13. Good.
  • the related term information acquisition unit 15 acquires related term information from a part of the cited document information acquired by the cited document information acquisition unit 14.
  • the related term information is information indicating related terms related to the term indicated by the term information received by the term information receiving unit 12.
  • the detailed configuration of the related term information acquisition unit 15 will be described later with reference to FIG.
  • the related term information output unit 16 outputs the related term information acquired by the related term information acquisition unit 15.
  • this output is output from a speaker that can be displayed on a display device (for example, a CRT or a liquid crystal display) or transmitted via a communication line to a predetermined device or printed by a printer. It may be stored in a recording medium or by audio output.
  • the related term information output unit 16 may or may not include an output device (for example, a display device or a printer).
  • the related term information output unit 16 may be realized by hardware, or may be realized by software such as a driver for driving these devices.
  • FIG. 2 is a block diagram showing a configuration of the related term information acquisition unit 15.
  • the related term information acquisition unit 15 includes a related term candidate information acquisition unit 21, a relevance degree calculation unit 22, and a related term information determination unit 23.
  • the related term candidate information acquisition means 21 is configured so that the cited reference information acquisition unit 14 obtains related term candidate information, which is information that is a candidate for a related term in a part of the cited reference information acquired by the cited reference information acquisition unit 14. A part of the acquired cited document information is also acquired.
  • the related term candidate information acquisition means 21 may not store the acquired related term candidate information and may store it in a recording medium.
  • various methods for acquiring the related term candidate information will be described here.
  • the related term candidate information acquisition means 21 performs morphological analysis on a part of the cited reference information acquired by the cited reference information acquisition unit 14, and acquires information indicating a single noun and Z or compound noun as related term candidate information. May be.
  • Related term candidate information acquisition means 21 acquires information indicating unknown words as related term candidate information in addition to information indicating single nouns, etc. May be.
  • the morpheme analysis is a process of dividing a natural language document into morphemes and recognizing the parts of speech of the morphemes.
  • “ChaSen” http: ZZch aS en. Nai st. Jp) developed at the Nara Institute of Science and Technology is known in Japan.
  • nouns are simple nouns, unknown words, or compound nouns that are connected together.
  • software that gives parts of speech to English words includes “TnT” (http: Z / www. Coll. Uni—saarland. DeZ to thorstenz tntZ) and “Brili TaggerJ (http: // www. Cs. Jhu. ⁇ BrillZ) etc.
  • the method of acquiring information indicating a single noun as related term candidate information has been disclosed in the above-mentioned Non-Patent Document 4 etc. Therefore, detailed description thereof is omitted.
  • the related term candidate information acquisition means 21 may acquire information indicating technical terms as related term candidate information from a part of the cited reference information acquired by the cited reference information acquisition unit 14.
  • a method for acquiring information indicating technical terms will be described.
  • the related term candidate information acquisition means 21 performs a simple noun, etc. from a part of the cited reference information acquired by the cited reference information acquisition unit 14 in the same manner as in the above-mentioned “Related Term Candidate Information Acquisition Method 1”. Extract information that indicates. Then, scoring indicating the possibility of being a technical term for each single noun is performed, and information indicating a single noun with a high score is acquired as related term candidate information.
  • a plurality of types are known as methods for assigning scores.
  • the word-making ability is a scale indicating the ease of composing a single noun compound word.
  • a high score is given to information indicating single nouns including nouns with high word-making ability.
  • the connection frequency may be used as an index indicating the coined word ability.
  • the concatenation frequency is the number of different words connected to a word, and the number of types of words that appear immediately before or after a word.
  • articulation frequency is a measure of how many words are included in a compound noun, in other words, how easy it is to compose a compound noun. In this way, a score can be given to each single noun or the like based on the coined word ability.
  • the compound noun score is the geometric mean of the scores of the constituent single nouns. May be given.
  • t be the single noun of interest.
  • the number of constituent words of t is n.
  • the score corresponding to t is calculated as follows.
  • score of t (frequency of occurrence of t) Z (average frequency of occurrence of single nouns with n as the number of words) [0065]
  • single noun etc. in the above formula means related term candidate information acquisition means 21 are simple nouns extracted.
  • the appearance frequency of t and the average appearance frequency of single nouns and the like with n as the number of words are calculated in a part of the cited document information acquired by the cited document information acquisition unit 14.
  • t occurrence frequency may be simply used as the score of t.
  • scoring may be performed using a method other than the above description, and information indicating technical terms may be acquired as related term candidate information using a method other than the above description.
  • Non-Patent Document 1 Methods for acquiring information indicating technical terms as related term candidate information are disclosed in Non-Patent Document 1, Non-Patent Document 3, and the following documents, and are well known in the past. Detailed description thereof is omitted.
  • the related term candidate information acquisition means 21 analyzes the structure of a part of the cited reference information acquired by the cited reference information acquisition unit 14 to identify a group of nouns, and includes single nouns included in the specified noun group. May be acquired as related term candidate information.
  • a group of nouns is a group including one or more single nouns.
  • a part of the cited document information acquired by the cited document information acquiring unit 14 is a claim of a patent application document as shown below.
  • a load detection means for detecting a load created in the work mechanism of the work described above, and a frequency corresponding to the detection value of the load detection means
  • a first frequency converter that outputs a first frequency signal
  • a second frequency converter that outputs a pulse of a frequency corresponding to the detection value of the load detecting means, and the first frequency modulation output signal.
  • the first feature is that there is a “group of nouns” at the end of the claim.
  • “virtual vibration generating device for operating a work machine” corresponds to the group of nouns.
  • the second is that when the claim is divided into the first half and the second half by a character string such as “in”, “that is”, immediately before “in” and “in”. Is also characterized by the existence of a “group of nouns” similar to that at the end.
  • “work machine” corresponds to a group of the nouns.
  • the related term candidate information obtaining means 21 obtains a group of nouns at the end of the claim and a group of immediately preceding nouns such as ⁇ Ode, '' and a single name included in the group of these nouns. Information indicating lyrics etc. may be acquired as related term candidate information.
  • the related term candidate information acquisition means 21 can specify the location of a group of nouns immediately before “in” as a clue with “in”.
  • a method for analyzing a document structure to identify a group of nouns and obtaining information indicating single nouns included in the group of identified nouns is disclosed in the following documents. Since the conventional force is also known, detailed description thereof is omitted.
  • the related term candidate information acquisition means 21 stores information indicating common nouns that frequently appear.
  • the related term candidate information power may be excluded.
  • the related term candidate information acquisition means 21 holds one or more pieces of information indicating nouns to be excluded in a recording medium (not shown), and indicates the nouns to be excluded in the information acquired as the related term candidate information. If any information power is included, the related term candidate information power may be excluded from the information. Examples of nouns to be excluded include methods, descriptions, publications, and literature.
  • the degree-of-association calculating unit 22 calculates the degree of association of terms indicated by the related term candidate information acquired by the related term candidate information acquiring unit 21 for each piece of related term candidate information.
  • the degree-of-association calculating unit 22 may store the calculated degree of association and the related term candidate information acquired by the related term candidate information acquiring unit 21 in association with each other, not shown in the drawing.
  • the degree-of-relevance calculating means 22 indicates the degree of relevance of the related term candidate information, and the term indicated by the related term candidate information is a part of the cited reference information acquired by the cited reference information acquisition unit 14 ( If one or more of them are acquired, it may be the frequency of appearance in the set). Also, the degree-of-association calculation means 2 2 has, for each related term candidate information, described in “Related Term Candidate Information Acquisition Method 2”, a score indicating the possibility of being a technical term, for example, a score based on word-making ability A score based on the frequency or a score based on the connection frequency may be calculated, and the score may be used as the relevance level.
  • the relevance degree calculation unit 22 may acquire the score from the related term candidate information acquisition unit 21. Alternatively, a new score may be calculated. Accordingly, the calculation of the relevance by the relevance calculation means 22 includes acquisition of the relevance. Further, the degree-of-association calculating unit 22 may calculate the degree of association using both the appearance frequency of the term indicated by the related term candidate information and the above score. For example, the degree of association may be calculated by adding the two, or the degree of association may be calculated by multiplying the two.
  • the related term information determining unit 23 determines related term candidate information having a higher corresponding degree of related term candidate information from the related term candidate information acquired by the related term candidate information acquiring unit 21 as related term information.
  • the related term information determining means 23 may include, for example, a certain quantity of related terms candidate information acquired by the related term candidate information acquiring means 21 (for example, 30) or a certain percentage ( (For example, 30%) It is also possible to determine related term candidate information that has a corresponding relevance level equal to or higher than a predetermined threshold value as related term information, or to increase the corresponding relevance level by other methods.
  • the related term candidate information may be determined as related term information.
  • the related term information determination unit 23 sorts the relevance levels calculated by the relevance level calculation unit 22, selects related term candidate information corresponding to the higher rank of the sorted relevance levels, and does not illustrate it.
  • the related term information is determined by storing the related term information on the recording medium.
  • the related term information stored in the recording medium (not shown) is the related term information output by the related term information output unit 16.
  • the related term information determining means 23 may store the related term information in a recording medium (not shown). In this case, the stored related term information is read and output by the related term information output unit 16.
  • FIG. 3 is a flowchart showing the overall operation of the related term acquisition device 1.
  • Step S101 The term information receiving unit 12 determines whether or not the term information is received.
  • step S102 If the term information is received, the process proceeds to step S102. If not, the process of step S101 is repeated until the term information is received.
  • the document information acquiring unit 13 stores the document information having the term indicated by the term information received by the term information receiving unit 12 in all or part of the document information in the document information storage unit 11. Two or more document information capabilities that have been searched and all the document information retrieved or a part of the retrieved document information, which is referred to in the retrieved document information. A portion having at least the document identification information for bow I that is information for identifying the document is acquired from the document information storage unit 11. Details of this process will be described later using the flowchart of FIG.
  • the cited document information acquisition unit 14 is cited document information that is information indicating a document identified by the cited document identification information included in all or part of the document information acquired by the document information acquisition unit 13. Is retrieved from two or more document information stored in the document information storage unit 11, and a part of the retrieved cited document information corresponding to a predetermined item is stored in the document information. Obtained from Part 11. See Figure 5 for details of this process. This will be described later with reference to the flowchart.
  • Step S104 The related term information acquisition unit 15 uses related terms related to the term indicated by the term information received by the term information reception unit 12 from a part of the cited reference information acquired by the cited reference information acquisition unit 14. Related term information, which is information indicating Details of this processing will be described later using the flowchart of FIG.
  • Step S 105 The related term information output unit 16 outputs the related term information acquired by the related term information acquisition unit 15. Then, the process returns to step S101.
  • FIG. 4 is a flowchart showing details of the document information acquisition process in step S 102 in the flowchart of FIG.
  • the document information acquisition unit 13 includes document information including all or part of the term information received by the term information reception unit 12, and is stored from the two or more document information stored in the document information storage unit 11. Search for.
  • Step S 202 The document information acquisition unit 13 sets a counter i to 1.
  • Step S203 The document information acquisition unit 13 determines whether or not the i-th document information searched in Step S201 exists. If it exists, the process proceeds to step S204. If it does not exist, the series of processing ends, and the steps in the flowchart of FIG.
  • Step S 204 The document information acquisition unit 13 acquires all or a part of the searched i-th document information.
  • a part including at least the cited document identification information is acquired.
  • the document information acquisition unit 13 does not have to acquire anything!
  • Step S205 The literature information acquisition unit 13 increments the counter i by 1 and returns to Step S203.
  • FIG. 5 is a flowchart showing details of the cited document information acquisition process of step S103 in the flowchart of FIG. (Step S301)
  • the cited document information acquisition unit 14 sets the counter i to 1.
  • Step S302 The cited document information acquisition unit 14 determines whether or not the i-th document information acquired by the document information acquisition unit 13 exists. If it exists, the process proceeds to step S303. If it does not exist, the series of processing ends, and the process proceeds to step S104 in the flowchart of FIG.
  • Step S303 The cited document information acquisition unit 14 acquires all the document identification information for bow I included in the i-th document information acquired by the document information acquisition unit 13.
  • Step S304 The cited document information acquisition unit 14 sets the counter j to 1.
  • Step S 305 The cited document information acquisition unit 14 determines whether or not the j-th cited document identification information acquired in Step S 303 exists. And if present, step
  • step S307 proceeds to step S307, and if it does not exist, proceed to step S306.
  • Step S306 The cited document information acquisition unit 14 increments the counter i by one. Then, the process returns to step S302.
  • Step S307 The cited document information acquisition unit 14 searches the document information storage unit 11 for the document information for the bow I identified by the jth reference document identification information acquired in step S303, and performs the search. A predetermined part of the cited reference information obtained is acquired from the reference information storage unit 11.
  • Step S308 The cited document information acquisition unit 14 determines whether a part of the cited document information acquired in Step S307 has already been acquired. If it has already been acquired, the process proceeds to step S310, and if not, the process proceeds to step S309.
  • Step S309 The cited document information acquisition unit 14 temporarily stores a part of the cited document information acquired in step S307. A part of the cited document information added in step S309 becomes a part of the cited document information finally acquired by the cited document information acquiring unit 14.
  • Step S310 The cited document information acquisition unit 14 increments the counter j by 1. Then, the process returns to step S305.
  • FIG. 6 shows the related term information acquisition process of step S 104 in the flowchart of FIG. It is a flowchart shown about a detail.
  • Step S401 The related term candidate information acquisition means 21 uses a part of the cited document information acquired by the cited document information acquisition unit 14 (if two or more of the cited document information are acquired, Two or more sets of part of cited reference information), that is, partial power of cited reference information temporarily stored in step S309.
  • Step S 402 Relevance calculation means 22 sets counter i to 1.
  • Step S403 The degree-of-association calculating unit 22 determines whether there is related term candidate information of the cells for which the related term candidate information acquiring unit 21 has also acquired the partial power of the cited document information. If it exists, the process proceeds to step S404. If it does not exist, the process proceeds to step S407.
  • the degree-of-association calculating unit 22 calculates the degree of association of the related term candidate information of the cells for which the related term candidate information acquiring unit 21 has also acquired partial power of the cited document information.
  • Step S405 The related term information determining means 23 temporarily stores the degree of association calculated in step S404 in association with the corresponding related term candidate information.
  • the related term candidate information corresponding to the degree of relevance calculated in step S404 is the i-th related term candidate information acquired by the related term candidate information acquisition means 21 as part of the cited reference information. is there.
  • Step S406 The degree-of-association calculating means 22 increments the counter i by 1. Then, the process returns to step S403.
  • Step S407 The related term information determining means 23 sorts the relevance temporarily stored in Step S405.
  • Step S408 The related term information determining means 23 determines related term candidate information corresponding to the higher relevance degree sorted in step S407 as related term information. Then, the series of processing ends, and the process proceeds to step S105 in the flowchart of FIG.
  • the donation information is an academic paper.
  • the cited document information acquired by the cited document information acquiring unit 14 is document information related to a patent. Therefore, by inputting the term information indicating the academic paper term into the related term acquisition device 1, the related term information indicating the patent term related to the academic paper term is output.
  • academic paper terms are terms that are used relatively frequently in academic papers (for example, specific subordinate terms such as DRAM and word processor), and patent terms are used relatively frequently in patent documents. It is a term (for example, an abstract high-level term such as a semiconductor storage device or a document editing device).
  • the information of the patent publication (1993 to 2002) is included in the document information stored in the document information storage unit 11.
  • the bibliographic information of about 85,000 bibliographic information of cited papers in the patent publication is included in the bibliographic information stored in the bibliographic information storage unit 11.
  • the method disclosed in the following document was used as a method for extracting the bibliographic information of the paper cited in the patent publication from the patent publication.
  • the nouns are extracted from the information of the patent publications stored in the document information storage unit 11. Then, the extracted nouns are arranged in order of frequency, and the 25 most frequently used articles are selected manually. Specifically, “CPU”, “semiconductor laser”, “DRAM”, “memory cell”, “word processor”, “novolak resin”, “CD”, etc. were selected.
  • the pattern of the portion of the cited document information acquired by the cited document information acquiring unit 14 is as follows.
  • the pattern in which the related term candidate information acquisition unit 21 acquires the related term candidate information is as follows.
  • the related term candidate information acquisition unit 21 does not acquire a related term candidate that matches a phrase included in the unnecessary phrase list by using a predetermined unnecessary phrase list.
  • Terms such as “method” and “description” appear in many claims regardless of the field, so that such terms are excluded from the related term candidates.
  • the unnecessary word list the nouns included in the patent publications for 10 years are arranged in order of appearance frequency, and the words to be included in the unnecessary word list are selected manually from the most frequently used ones. For example, a total of 350 unnecessary words such as method, description, publication, literature, use, use, details, manufacturing, proposal, manufacturing method, and data are included in the unnecessary word list.
  • the relevance calculation means 22 does not calculate the relevance! .
  • the related term information determining unit 23 determines the related term candidate information acquired by the related term candidate information acquiring unit 21 as the related term information as it is.
  • the term information receiving unit 12 receives each of the 25 term information items described above, and the related term information output unit 16 outputs related term information related to the term information. Executed.
  • the related term information output unit 16 stores the related term information in a predetermined recording medium. After that, the related term information was used for verification.
  • n is the number of correct answers for the input.
  • R is the set of correct answer rank numbers in the output list.
  • takes a maximum value of 1 when all correct answers are ranked at the top.
  • FIG. 7 is a table showing the results of this experimental example.
  • the ID6 experiment example is a baseline experiment example in other experiment examples, in which nouns that co-occur with the term information frequently are extracted and the nouns are used as related terms.
  • ID6 experimental example that is the baseline Acquisition of related terms from the past This is the method used in As can be seen from the table in Fig. 7, when comparing the experimental example of ID1 with the experimental example of ID2, it is better to obtain a noun from a group of nouns in the claim than simply acquiring a noun. I understand that.
  • the related term acquisition device 1 As described above, according to the related term acquisition device 1 according to the present embodiment, it is possible to acquire effective related term information by acquiring a part of the cited document information. Therefore, it is possible to provide the related term acquisition device 1 that can appropriately acquire the related term information related to the accepted term information.
  • a part of the cited document information acquired by the cited document information acquiring unit 14 may be any part as long as the related term information can be appropriately acquired.
  • the cited document information is information indicating a patent document
  • the cited document information part corresponding to the predetermined item acquired by the cited document information acquiring unit 14 is a part of the independent claim. May be.
  • An independent claim is a claim that is not dependent on other claims.
  • the cited document information acquisition unit 14 may determine whether a certain claim is an independent claim, for example, as follows.
  • the cited document information acquisition unit 14 determines whether or not there is a description referring to another claim in the claim of interest, that is, the description of “Claim N”.
  • the cited document information is information indicating a patent document, and the portion of the cited document information corresponding to the item decided by the cited document information acquisition unit 14 that has been determined in advance has the effect of the invention.
  • the described part may be used.
  • the portion describing the effect of the invention may be, for example, the portion corresponding to the item “effect of the invention” or the content of the document. Good.
  • the cited document information is information indicating a patent document
  • the cited document information part corresponding to the item that is determined in advance by the cited document information acquiring unit 14 is other than the premise part of the claims. It may be a part.
  • the premise part of a claim is a part called a preamble of a claim, and is generally a part described before “in” or “in” in a claim.
  • the part other than the premise part of the claim is a part in which the characteristic description of the claim is made, and is generally later than “to” or “to” in the claim. It is a part that is described. If the preamble is described in the claim, the entire claim becomes a part other than the premise part.
  • the related term acquisition device further includes a cited document structure information acquisition unit.
  • FIG. 9 is a block diagram showing a configuration of the related term acquisition device 2 according to the present embodiment.
  • the related term acquisition device 2 includes a document information storage unit 11, The term information receiving unit 12, the document information acquiring unit 13, the cited document information acquiring unit 14, the related term information acquiring unit 15, the related term information output unit 16, and the cited document structure information acquiring unit 31 are provided. Configurations and operations other than the cited document structure information acquisition unit 31 are the same as those in the first embodiment, and a description thereof will be omitted.
  • the cited document structure information acquisition unit 31 obtains cited document structure information that is information indicating correspondence between one or more parts of the cited document information stored in the document information storage unit 11 and items of the parts. get.
  • the cited document structure information may be any information structure as long as the item can be associated with a portion of the cited document information corresponding to the item.
  • the cited document structure information may indicate an item by a tag and a portion corresponding to the item. That is, in the cited document information, a region sandwiched between ⁇ effect> and ⁇ Z effect> may be a portion corresponding to the item “effect”.
  • the cited document structure information may be stored in the document information storage unit 11.
  • the cited document structure information may have information indicating the portion of the cited document information corresponding to the item “effect” in association with the item “effect”.
  • the cited document structure information may be held in the document information storage unit 11 in association with the cited document information, or the cited document structure information acquisition unit 31 and other components are illustrated. No, you can hold it on the recording medium! ⁇ . In the latter case, it is preferable to store the cited document structure information in association with the cited document identification information.
  • the information indicating the cited document information part may be information indicating the position of the leading end and the position of the trailing end of the cited document information part, for example.
  • the positions of the leading edge and trailing edge may be indicated by, for example, the number of bytes of the leading force of the cited document information, or may be indicated by the number of pages and the number of lines.
  • the cited document structure information acquisition unit 31 holds the condition information indicating the condition corresponding to each item to be specified in a not-shown recording medium. Then, the portion of the cited document information corresponding to the condition information is specified, and the cited document structure information is acquired by associating the item with the portion. For example, the condition information corresponding to the item “Effect” starts with a paragraph starting from “As above” or “Like this”, and the word “Effect”, “Possible”, or “Yes” appears in the paragraph. To be included To do.
  • the cited document structure information acquisition unit 31 can acquire the cited document structure information by specifying a paragraph that satisfies the condition and associating the paragraph with the item “effect”.
  • the condition information corresponding to the item “characteristic description of claim” is a part included in the claim, which is a part that follows “is” and “in”. ”.
  • the cited document structure information acquisition unit 31 can acquire the cited document structure information by specifying a portion that meets the condition and associating the portion with the item “characteristic description of claim”.
  • step S103 the cited document information acquisition unit 14 uses the cited document structure information acquired by the cited document structure information acquisition unit 31, and is a part of the cited document information that is determined in advance. The part corresponding to is acquired.
  • Step S501 The cited document structure information acquisition unit 31 determines whether or not the cited document structure information has already been acquired. If it has not been acquired, the process proceeds to step S502. If it has already been acquired, the process proceeds to step S102.
  • the cited document structure information acquisition unit 31 acquires the cited document structure information for all of the cited document information stored in the document information storage unit 11. Here, in the document information stored in the document information storage unit 11, it is insignificant until the document information is acquired as to whether V or the deviation is the cited document information.
  • the obtaining unit 31 acquires the cited document structure information for all the document information stored in the document information storage unit 11.
  • the citation structure information is acquired when the related term acquisition device 2 does not perform related term acquisition processing, that is, when the processing load on the related term acquisition device 2 is large. Go ahead and do it! /.
  • the related term acquisition device 2 since a part of the cited document information is acquired using the cited document structure information, the structure of the document is manifested in the cited document information. Even if there is not, it is possible to easily obtain a desired portion of the cited document by using the cited document structure information.
  • the related term acquisition device acquires a part of cited document information corresponding to an item corresponding to the cited format of the cited document in the document information.
  • the configuration of the related term acquisition device is the same as that shown in FIGS. 1 and 2 of Embodiment 1, and the description thereof is omitted.
  • the document information acquisition unit 13 acquires a portion having at least cited reference identification information and at least including reference information.
  • the cited document information acquisition unit 14 is a piece of cited document information that is information indicating a document identified by the document identification information for bow I included in all or part of the document information acquired by the document information acquisition unit 13.
  • the part corresponding to the item that is decided according to the quotation format in the information for bow I included in all or part of the document information acquired by the document information acquisition unit 13 is acquired.
  • the cited document information acquisition unit 14 acquires a part corresponding to an item determined according to the citation format
  • the citation document is a predetermined format. In some cases, no information is obtained.
  • the citation information is information indicating a description of the document identified by the cited document identification information.
  • the citation information may exist in the same area as the cited document identification information, and the citation information may exist in a different area from the cited document identification information.
  • cited document identification information is described in the background art area.
  • the cited document is cited by the cited document identification information itself.
  • Such citations may be made in patent specifications and the like.
  • the cited document identification information is collectively described in the reference column, and corresponds to the cited document identification information described in the reference column in the text of the document.
  • the cited reference is used for the bow I by the given index.
  • the index is, for example, 1, 2, 3 ... or 1), 2), 3) ... ⁇ [1], [2], [3] ... ⁇ [A], [B], [C ] ..., patent document 1, patent document 2, patent document 3, ..., non-patent document 1, non-patent document 2, non-patent document 3, ..., etc., which can identify cited document identification information Anything can be used.
  • the cited information is information including the cited document identification information.
  • the information including cited document identification information is, for example, information on a predetermined area including cited document identification information.
  • the predetermined area may be, for example, a sentence or a noragraph, or a group of other documents.
  • the cited information is information including the index.
  • the information including the index is, for example, information on a predetermined area including the index.
  • the predetermined area may be, for example, a sentence or a noragraph, or a group of other documents.
  • the literature information acquisition unit 13 It is possible to acquire a part of a set of document information including “”, or to acquire two or more parts of document information including both of them.
  • the reference information may be manifested in all or a part of the document information, or may be manifested. It does not have to be converted. In the former case, the portion of the reference information can be easily saved in all or part of the literature information.
  • the citation format includes, for example, a problem-pointing citation format (hereinafter also referred to as “type C” citation format) that points out the problem of the cited document, and a format that incorporates the cited document as the basis of the editorial.
  • An editorial basis type There are citation formats (hereinafter also referred to as “Type B” citation formats) and other citation formats (hereinafter also referred to as “Type 0” citation formats).
  • the type B citation format is a citation format used when a new theory is proposed or a system is constructed using existing research results. It should be noted that other citation formats may be used.
  • the cited document information acquisition unit 14 holds, for example, one or more hands according to each citation format in a recording medium (not shown), and the hand has a force phrase at a predetermined position of the quote information. Depending on whether it is included, each citation format may be determined.
  • the predetermined position may be, for example, all of the citation information that may be at the beginning of the citation information or at the end of the citation information. Note that other methods may be used as long as the citation format can be determined from the citation information.
  • the overall operation of the related term acquisition apparatus 1 according to the present embodiment is the same as that of the flowchart of FIG.
  • the flowchart shown in detail in the flowchart of FIG. 3 for obtaining the document information in step S102 is also the same as the flowchart of FIG. 4 of the first embodiment, and a description thereof will be omitted.
  • the document information acquisition unit 13 also acquires citation information in step S204 of FIG.
  • the flowchart showing the details of the related term information acquisition process of step S104 in the flowchart of FIG. 3 is the same as the flowchart of FIG. 6 of the first embodiment, and a description thereof will be omitted.
  • FIG. 11 is a flowchart showing details of the cited document acquisition processing in step S103 in the flowchart of FIG.
  • the processes other than steps S601 to S603 are the same as those in the flowchart of FIG. 5 of the first embodiment, and the description thereof is omitted.
  • the cited document information acquisition unit 14 includes all the cited document identification information included in the i-th document information acquired by the document information acquisition unit 13 and the cited information corresponding to the cited document identification information. And get. If the citation information part is not obvious in all or part of the literary information acquired by the literary information acquisition unit 13, it is the same as described in the citation information acquisition process by the literary information acquisition unit 13. Thus, the cited document information acquisition unit 14 can identify the portion of the cited information. [0142] (Step S602) The cited document information acquisition unit 14 determines the citation format in the cited information corresponding to the j-th cited document identification information. The citation format may be determined, for example, by storing the j-th cited document identification information and information indicating the bow I format in association with each other in a not-shown recording medium. Details of this processing will be described later using the flowchart of FIG.
  • Step S603 The cited document information acquisition unit 14 searches the document information storage unit 11 for the document information for the bow I identified by the jth reference document identification information acquired in Step S601. A part corresponding to the citation format determined in step S602 is acquired from the document information storage unit 11 as a part of the cited document information.
  • FIG. 12 is a flowchart showing details of the citation format determination process in step S602 in the flowchart of FIG. As described above, it is assumed that the cited document information acquisition unit 14 holds one or more hands corresponding to each citation format in a recording medium (not shown).
  • Step S 701 The cited document information acquisition unit 14 sets a counter k to 1.
  • Step S702 The cited document information obtaining unit 14 determines whether or not the k-th hand whose citation information is type C includes a force phrase. If so, the process proceeds to step S703, and if not, the process proceeds to step S704.
  • Step S 703 The cited document information acquisition unit 14 determines the citation format as type C. Then, the series of processing ends, and the process proceeds to step S603 in the flowchart of FIG.
  • Step S704 The cited document information acquisition unit 14 increments the counter k by 1.
  • Step S 705 The cited document information acquisition unit 14 determines whether or not the k-th type C cue phrase exists. If it exists, the process returns to step S702. If it does not exist, the process proceeds to step S706.
  • Step S706 The cited document information acquisition unit 14 sets the counter k to 1.
  • Step S707 The cited document information acquisition unit 14 determines whether or not the k-th hand whose citation information is type B includes a power phrase. If it is included, the process proceeds to step S708; otherwise, the process proceeds to step S709.
  • Step S 708 The cited document information acquisition unit 14 determines the citation format as type B. And thus, the series of processing ends, and the process proceeds to step S603 in the flowchart of FIG.
  • Step S709 The cited document information acquisition unit 14 increments the counter k by 1.
  • Step S710 The cited document information acquisition unit 14 determines whether or not the k-th type B manual strength phrase exists. If it exists, the process returns to step S707. If it does not exist, the process proceeds to step S711.
  • Step S711 The cited document information acquisition unit 14 determines the citation format as type O. Then, the series of processing ends, and the process proceeds to step S603 in the flowchart of FIG.
  • the bibliographic information storage unit 11 stores about 12000 full-text papers (Bibliographic information) centered on natural language processing fields in Postscript and PDF format. Of these, approximately 8,000 are included in ACL Anthology provided by ACL, and the remaining approximately 4,000 are also collected from the web pages of natural language processing researchers and natural language processing laboratories in Japan and overseas. And paper data (bibliographic information) extracted from the proceedings of the international conference (CD-ROM).
  • the document information stored in the document information storage unit 11 in this specific example is an academic paper and! /, Another kind of document information.
  • the document information acquisition unit 13 obtains information in the column of the reference document of the document information from the document information in which the term indicated by the term information received by the term information reception unit 12 is included in the title And the corresponding citation information.
  • the cited document information acquisition unit 14 uses the hand phrase corresponding to type C, which is information having a clue phrase corresponding to type C, and the hand corresponding to type B.
  • a type B hand which is information having phrases, holds a force phrase dictionary on a recording medium (not shown).
  • FIG. 13 is a diagram showing an example of a type C clue phrase dictionary.
  • FIG. 14 is a diagram showing an example of a type B cue phrase dictionary. The cited document information acquisition unit 14 determines that the citation format is type C when any of the clue phrases in the type C clue phrase dictionary is included in the citation information, and the citation information is for type B. If the hand contains any clue phrase in the force phrase dictionary, the citation form is determined to be type B.
  • the cited document information acquiring unit 14 acquires the title of the cited document as a part of the cited document. First, it is assumed that the user inputs the term information “ term i no l 0 gy” to the related term acquisition device 1.
  • the term information "t er minol 0 gy" are accepted by the term information receiving section 12 (scan Tetsupu S101). Then, the document information acquisition unit 13 acquires the reference field and reference information of each document information including the term information “terminol 0 gy” in the title (steps S 102, S 201 to S 201). S205).
  • the document information acquiring unit 13 searches each document information including the term information “terminology” in the title, and acquires the bibliographic information of the searched document information (step S201).
  • FIG. 15 is a diagram showing an example of the bibliographic information of the acquired document information. As shown in Fig. 15, the bibliographic information includes "author”, “title”, and “other”. “Author” is information about the author (s) of the paper, “Title” is information about the title of the paper, and “Other” is information such as the name and year of the journal. Note that in FIG. 15, “ID” is associated with each record. This “ID” exists for record management in the table.
  • the document information acquisition unit 13 uses the document information identified in each record shown in Fig. 15 to determine the reference document (Reference) column part and the I document identification information included in the reference document column. Information for bow I corresponding to the information is acquired (step S204).
  • the document information acquiring unit 13 acquires a predetermined index corresponding to the cited document identification information described in the reference column from the reference column, and searches for a sentence including the index. .
  • the document information acquisition unit 13 extracts sentences (1) to (6) that are deeply connected to the sentence including the index using the following power phrases.
  • the document information acquisition unit 13 3 culls the hand if the sentence following the sentence including the index includes the following clue phrases (1) to (6)!
  • the sentence containing the phrase is extracted as a sentence closely related to the sentence containing the index.
  • the document information acquisition unit 13 acquires the sentence including the index, the sentence including the index, the depth of connection, and the sentence as citation information.
  • the cited document information acquisition unit 14 is based on a part of the document information acquired by the document information acquisition unit 13 (that is, the citation information corresponding to the reference document part and each cited document identification information). Then, the citation format is determined, and the portion of the cited document corresponding to the citation format is acquired (step S103, flowcharts of FIGS. 11 and 12).
  • the citation format is changed to type C. If the citation information includes a force phrase in one of the hand phrase dictionary type B shown in Fig. 14, the citation format is determined as type B, type C, type B If it is neither of these, the citation format is determined as type O (steps S701 to S711). Then, when the citation format is type C, the cited document information acquisition unit 14 acquires information indicating the title of the cited document information as a part of the cited document information.
  • the cited document information acquisition unit 14 acquires information indicating a summary (abstract) of the cited document information as part of the cited document information.
  • the cited document information acquisition unit 14 does not acquire a part of the cited document information when the citation format is type O.
  • the related term information output unit 16 acquires related term information from a partial set of cited reference information acquired by the cited reference information acquisition unit 14 (step S104, flowchart of FIG. 6). The related term information output unit 16 outputs the acquired related term information (step S105).
  • the method for obtaining the citation information is not limited to the specific description described above.
  • another method for acquiring citation information will be described.
  • the document information acquisition unit 13 acquires a predetermined index corresponding to the cited document identification information described in the reference column from the reference column, and reads a sentence including the index. Search and identify the sentence that contains the index. Then, the literature information acquisition unit 13 uses the sentence including the index and the sentence closely related to the sentence as citation information. And get.
  • the document information acquisition unit 13 can access the following two types of hand-powered phrases stored on a recording medium (not shown).
  • the document information acquisition unit 13 sets a sentence including an index as a candidate sentence for citation information. Then, if the first sentence of the candidate sentence contains (A) at least one of the previous extension hand and the phrase is included, the sentence one sentence before the candidate sentence is included in the candidate sentence. The document information acquisition unit 13 repeats the process of extending the candidate sentence before the first sentence of the candidate sentence does not include (A) the previous extension handwriting and all of the phrases. In addition, the document information acquisition unit 13 adds (B) an extension hand to the next sentence after the candidate sentence (a sentence that is adjacent to the candidate sentence and not included in the candidate sentence). Is included, the next sentence of the candidate sentence is included in the candidate sentence.
  • the document information acquisition unit 13 repeats the process of extending the candidate sentence later until the next sentence after the candidate sentence does not include all of the words (B).
  • the candidate sentence at the time when the process of extending the candidate sentence is completed becomes the citation information.
  • the literature information acquisition unit 13 acquires the citation information.
  • (A) the pre-extension hand phrase is an example
  • the (B) post-extension clue phrase is an example, so long as it is possible to appropriately extract sentences closely related to the sentence containing the index. Other clue words may be used.
  • the sentence that contains the index but in the technical literature that directly refers to the cited document by the cited document identification information, the sentence that contains the cited document identification information is replaced with the sentence that contains the index. Will be used instead of.
  • the method of determining the citation format is not limited to the specific description described above.
  • another method for determining the citation format will be described.
  • the document information acquisition unit 13 considers the position of the cue phrase included in the citation information that does not determine the citation format only based on whether or not the predetermined hand is included in the citation information.
  • the citation format may be determined. More specifically, for example, in the type C clue phrase dictionary, as shown in FIG. 20, the type C hand is a power phrase (in FIG. 20, “cue phrase”) and the hand is strong. Assume that the position of the phrase is associated.
  • the document information acquisition unit 13 determines that the type of citation information is type C when the type C hand is included in the sentence indicated by the position of the corresponding hand. You may judge.
  • the document information acquiring unit 13 selects any one of the citation information from the next sentence (first sentence) to the third sentence after the sentence including the index and the cited document identification information. If “However” is included in the sentence, the citation format of the citation information may be determined to be type C. Although the method for determining whether the citation format is type C has been described here, the method for determining whether the citation format is type B is the same.
  • the related term acquisition device 1 it is possible to acquire a part of the cited document information corresponding to the item according to the cited format of the cited document. Accordingly, it is possible to obtain related term information with higher accuracy by appropriately setting which part of the cited document information corresponding to which item is to be obtained.
  • the citation format is S type C
  • information indicating the title of the cited document information is acquired
  • the citation format is type B.
  • the cited document information acquisition unit 14 may refer to the cited document in the information for the bow I included in all or part of the document information acquired by the document information acquisition unit 13 S. A portion having background items in the information is acquired and included in all or a part of the document information acquired by the document information acquisition unit 13. If the citation format in the citation information is an editorial-based citation format, a portion having characteristic items in the citation information may be acquired.
  • the background items are, for example, patent documents, “conventional technology”, “background technology”, etc., and academic documents that may be “introduction”, “related research”, etc. There may be.
  • the characteristic items are, for example, patent documents, “embodiments”, “effects”, etc. There may be.
  • the method described in the first and second embodiments can be used as a method for obtaining the portion of the reference document information corresponding to these items.
  • the related term acquisition device 2 may execute the process of acquiring the cited document information part according to the citation format! /.
  • the document information stored in the document information storage unit 11 may be of two or more types.
  • the document information storage unit 11 at least a first type of document information and a second type of document information are stored, and the document information acquisition unit 13 is stored in the document information storage unit 11. !, All or part of the first type of document information is acquired from the first type of document information, and the cited document information acquisition unit 14 stores the second type of information stored in the document information storage unit 11.
  • a part of the cited document information may be acquired from the type of document information.
  • the first type of document information may be information on academic papers
  • the second type of document information may be patent document information or vice versa.
  • the first type of document information may be information of the first type of language
  • the second type of document information may be information of the second type of language.
  • the first type language and the second type language may be any language such as English, Japanese, Chinese, French, German, and the like. In this way, for example, it may be possible to obtain English related terms related to Japanese terms.
  • FIG. 16 is a block diagram showing the configuration of the technology trend information creation device according to this embodiment.
  • the technical trend information creation device 1001 includes a search term reception unit 1011, a document information storage unit 1012, a search unit 1013, a specification unit 1014, a document information acquisition unit 1015, a technology A term extraction unit 1016, a time information acquisition unit 1017, a technology trend information creation unit 1018, and an output unit 1019 are provided.
  • the search term receiving unit 1011 receives a search term.
  • the search term is information used as a search keyword in a search by the search unit 1013 described later.
  • the search term accepted by the search term receiving unit 1011 is actually information indicating the search term, but is simply referred to as a search term.
  • the technical trend information creation device 1001 searches for technical documents based on the search terms received by the search term receiving unit 1011, acquires information indicating the searched technical documents, and based on the information, acquires the technical trends. Information will be created. As described above, this search term is used to determine a field or the like in which technical trend information is created. This search term may be force or other information that is typically text data.
  • this search term may be information indicating the period of the technology trend information to be created, for example, information indicating the field of the technology trend information to be created.
  • the search term may be, for example, a symbol associated with the field or the name of the field. Examples of symbols associated with the fields include IPC (international patent classification) and us class (US patent classification).
  • search term receiving unit 1011 receives a search term indicating the name of the field of technology trend information.
  • the search term accepting unit 1011 accepts the search term using, for example, a wired or wireless communication line that accepts information input from an input device (eg, keyboard, mouse, touch panel, etc.). It is also possible to accept information read from a predetermined recording medium (for example, an optical disk, a magnetic disk, a semiconductor memory, etc.) by receiving information transmitted through the network.
  • the search term receiving unit 1011 may or may not include a device (for example, a modem or a network card) for receiving.
  • the search term acceptance unit 1011 may be implemented by hardware. Alternatively, it may be realized by software such as a driver that drives a predetermined device.
  • the literature information is information indicating at least a part of the technical literature.
  • the technical literature includes, for example, technical papers, patent 'utility model publications, published technical bulletins, technical documents and articles, technical web pages, and the like.
  • One document information includes at least part of information (data) of the corresponding one technical document.
  • the part of the technical document is a part of the technical document that includes at least a range necessary for subsequent processing.
  • the document information may be information indicating the title of the technical document and the reference document.
  • the document information may be information indicating all technical documents.
  • time information is information which shows the time regarding the technical literature which literature information shows.
  • the time information may be, for example, information indicating the time when the document information was created, or information indicating the time when the document information was released. Also good.
  • the time information may be, for example, information indicating the year related to the literature information, information indicating the appropriate date, information indicating the correct date, or information indicating other time. It is assumed that the document information and the time information are directly or indirectly associated with each other so that it can be understood which technical document is at which time. In the former case, for example, document information and time information corresponding to each other may be included in the same record. In the latter case, for example, document information and time information may be associated with each other through a document ID for identifying a technical document.
  • the document information and the time information may be a text file, or may be a file of a unique format such as a professional software.
  • the document information and the time information may be separate data or may be integrated as data.
  • the document information storage unit 1012. may include, for example, cited document identification information and cited information.
  • Cited reference identification information refers to reference information. This is information for identifying other technical documents (hereinafter, this technical document may be referred to as “cited document”). Cited information is information indicating the description of other technical documents cited in the document information including the cited information. In the technical literature, the description is, for example, a description for pointing out the problems of the technical literature, a description for proposing a new theory or constructing a system based on the technical literature. Etc.
  • the cited document identification information may be any information as long as it is information that can identify the cited document.
  • Cited reference identification information includes, for example, the author's name, title, date of publication, and medium, which may be an ID for identifying a cited reference such as a patent number, an application publication number of a patent application, or a PMID (PubMed Identifier). It may be bibliographic information of a document that can identify a cited document such as information (journal name, publisher name, URL, etc.). In the present embodiment, it is assumed that the document information includes at least a title, cited document identification information, and cited information.
  • the cited document identification information and the cited information will be described in more detail.
  • the citation information may exist in the same area as the cited document identification information, and the citation information may exist in a different area from the cited document identification information.
  • the cited document identification information is described in the background art area, and the cited document is cited by the cited document identification information itself.
  • Such citations may be made in patent specifications.
  • the cited document identification information is collectively described in the column of the reference, and corresponds to the cited document identification information described in the column of the reference in the text of the document. This is a case in which a cited document is cited by a predetermined index.
  • the index is, for example, 1, 2, 3 ... or 1), 2), 3) ... ⁇ [1], [2], [3] ... ⁇ [A], [B], [C ],
  • Patent document 1, patent document 2, patent document 3, ..., non-patent document 1, non-patent document 2, non-patent document 3, ..., etc. which can identify the cited document identification information. Anything may be used.
  • the cited information is information including the cited document identification information.
  • the information including the cited document identification information is, for example, information on a predetermined area including the cited document identification information.
  • the predetermined area can be a sentence, for example.
  • the cited information is information including the index.
  • the information including an index is, for example, information on a predetermined area including an index.
  • the predetermined area may be, for example, a group of documents other than a sentence or a noragraph.
  • the predetermined area may be determined based on a predetermined rule, as will be described later.
  • the citation format includes, for example, a problem-pointing citation format (hereinafter also referred to as “type C” citation format), which is a format that points out the problem of the cited document, and a format that incorporates the cited document as the basis of the editorial.
  • type C problem-pointing citation format
  • Type B citation format is a citation format used when a new theory is proposed or a system is constructed using existing research results. In addition, it may be classified by other citation formats.
  • the citation form of the above description is described in the following document.
  • the document information storage unit 1012 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, or the like).
  • the document information storage unit 1012 may store any document information.
  • document information transmitted via a communication line or the like may be stored in the document information storage unit 1012 even if document information or the like is stored in the document information storage unit 1012 via a recording medium.
  • document information input via an input device may be stored in the document information storage unit 1012.
  • the search unit 1013 searches the document information storage unit 1012 using the search term received by the search term receiving unit 1011. It is preferable that the search unit 1013 searches for document information corresponding to the technical document whose search term received by the search term receiving unit 1011 is the main topic.
  • “Document information corresponding to a technical document” means document information indicating at least a part of the technical document.
  • the main topic is the name of a research field. Wide range Narrowness can vary depending on the area of interest. For example, “morphological analysis” can be a main topic, but “natural language processing” including it can also be a main topic.
  • the search unit 1013 may search a part of each piece of document information stored in the document information storage unit 1012 (for example, only the title etc.). ) Or information other than literature information may be searched.
  • the document information has a structure capable of specifying a part of the document information.
  • the item name may be described for each part of the document information, or the item may be specified using a markup language tag or the like.
  • “Information other than bibliographic information” may be, for example, time information or other bibliographic information.
  • the search unit 1013 searches, for example, document information in which the search term is described more than a predetermined number of times.
  • literature information in which the search term is described in a plurality of predetermined areas for example, a title and a summary
  • a method for searching for document information corresponding to a technical document having a certain search term as a main topic is already known in the following documents, and the detailed description thereof is omitted.
  • the search unit 1013 finds the main topic in the title of the technical document indicated by the document information stored in the document information storage unit 1012.
  • the document information including the search term received by the search term receiving unit 1011 may be searched for in the indicated area.
  • the area indicating the main topic may be, for example, the last area of the title in a Japanese title, or the first area of the title in an English title.
  • search processing itself performed by the search unit 1013 is already known, and thus detailed description thereof is omitted.
  • one or more document information is generally specified as the search result.
  • the identification may be performed, for example, by accumulating information for identifying the document information in a recording medium such as a temporary storage memory, or by setting a flag in association with the document information. .
  • the identifying unit 1014 identifies document information that has the same field as the document information retrieved by the retrieval unit 1013. By specifying by the specifying unit 1014, the reference information that the search unit 1013 has not searched for is used. However, the document information that has the same field as the reference information searched by the search unit 1013 is also created. Can be used.
  • the identification method there are, for example, (1) a method using citation relations, and (2) a method using document classification and document clustering.
  • the document information corresponding to the technical document cited in the document information searched by the search unit 1013 is assumed to have the same field as the document information searched by the search unit 1013. It is a method of specifying. In specifying the document information, there are, for example, (11) a method using only cited document identification information and (12) a method using cited document identification information and a format for bow I.
  • the identifying unit 1014 acquires the document identification information for bow I, which is information for identifying other technical documents cited in the document information, from the document information searched by the search unit 1013. Then, the document information is specified by the acquired cited document identification information.
  • the specifying unit 1014 acquires the columns of the patent document and the non-patent document that may acquire the column of the reference that is considered to include the cited document identification information.
  • the area where the cited document identification information exists may be acquired by using a technique such as pattern matching.
  • Obtain references and patent documents In this case, for example, information indicating an item corresponding to the part to be acquired is stored in a recording medium (not illustrated), and the specifying unit 1014 reads information indicating the recording medium force item and corresponds to the item.
  • An area of literature information may be acquired.
  • the specifying unit 1014 may acquire an area including the cited document identification information in the document information by reading a pattern or the like from the recording medium and performing pattern matching or the like.
  • the cited document identification information acquired by the specifying unit 1014 may be temporarily stored in a recording medium (not shown).
  • the identifying unit 1014 acquires the cited document identification information from the document information searched by the search unit 1013, and also acquires citation information that is information indicating a description of another technical document cited in the document information. To do. Then, if the citation format indicated by the obtained citation information is the problem indication type I format (type C bow I format), which is a format that points out the problem of the technical document to be cited, The document information is specified by the document identification information for Bow I that identifies the technical document described in the information.
  • type C bow I format type C bow I format
  • the literature information is specified by the cited document identification information that identifies the technical document, and if not, It is not necessary to specify information.
  • the identification unit 1014 holds, for example, one or more clue words / phrases corresponding to the type C citation format in a recording medium (not shown), and the hand moves to the predetermined position of the citation information. Whether the citation form is type C may be determined by whether the phrase is included.
  • the predetermined position is, for example, the end of the citation information which may be the beginning of the citation information. Sentences that are all citation information or sentences that are cited in the citation information (for example, sentences that include cited document identification information or sentences that include an index corresponding to the cited document identification information) May be within a predetermined range.
  • the predetermined position may be set for each clue word. As long as the citation information ability can determine the citation format, other methods may be used.
  • the specifying unit 1014 specifies an area referring to the cited document identification information using the cited document identification information, and acquires the citation information including the specified area.
  • the index corresponding to the cited document identification information is referred to, and the area to be identified is identified using the index corresponding to the document identification information for Bow I, and the identified area is included.
  • Citation information may be acquired.
  • the citation information acquired by the specifying unit 1014 may be temporarily stored in a recording medium (not shown).
  • the identifying unit 1014 obtains the citation information, and when it is determined that the citation format in the citation information is type, the citation information is used to identify the technical document described in the citation information.
  • Document information will be specified.
  • the cited information when the cited document is referred to by the cited document identification information, the document information may be specified by the cited document identification information.
  • the cited information when the cited document is referred to by the index, the document identification information for the bow I corresponding to the index is identified, and the document is identified by the identified document identification information for the bow I. Identify the information.
  • the specifying unit 1014 may obtain the cited document identification information from the document information storage unit 1012. Or, when the document information acquisition unit 1015 described later acquires document information including cited document identification information, etc., even if the reference information is acquired from the document information acquired by the document information acquisition unit 1015 Good. In the present embodiment, the former case will be described.
  • the document information acquisition unit 1015 acquires all or a part of at least a part of the document information among the plurality of document information stored in the document information acquisition unit 1015.
  • the document information acquisition unit 1015 acquires all or part of the document information searched by the search unit 1013.
  • a part of the document information may be, for example, a title or a summary of the document information.
  • information indicating an item corresponding to the part to be acquired is stored in a recording medium (not shown), and the document information acquiring unit 1015 reads information indicating the item of the recording medium power and a document corresponding to the item.
  • the area of information may be acquired as part of the literature information.
  • the search unit 1013 searches for part of the document information, a part of the document information to be acquired and a part of the document information used in the search May be the same part of the document information or different parts of the document information.
  • the document information acquisition unit 1015 also acquires all or part of the document information specified by the specification unit 1014 from the document information storage unit 1012. For example, when the document information is specified by the cited document identification information, the document information acquisition unit 1015 searches the document information storage unit 1012 using the cited document identification information, and all of the retrieved document information. Or get some.
  • the document information area acquired by the document information acquiring unit 1015 using the specified result by the specifying unit 1014 is the same as the document information area acquired by the document information acquiring unit 1015 using the search result by the searching unit 1013.
  • the document information acquisition unit 1015 acquires the title of the specified document information.
  • the document information acquisition unit 1015 may temporarily store all or a part of the document information acquired according to the search result or the specific result on the recording medium! .
  • the technical term extraction unit 1016 extracts all or part of the technical information of the literature information acquired by the literature information acquisition unit 1015.
  • the document information acquired by the document information acquisition unit 1015 is the document information acquired by the document information acquisition unit 1015 partially or entirely.
  • the technical term is a term used in the technical literature field searched by the search unit 1013.
  • This technical term is preferably a technical term that is an element in the technical literature field searched by the search unit 1013. By looking at the transition of this technical term, it is possible to know the technical trend.
  • the technical terms extracted by the technical term extraction unit 1016 are actually information indicating technical terms, but are simply referred to as technical terms. A method by which the technical term extraction unit 10 16 extracts technical terms will be described later.
  • the technical term extraction unit 1016 may temporarily store the extracted technical terms on a recording medium without illustration.
  • the timing information acquisition unit 1017 acquires the timing information corresponding to the literature information including the technical terms extracted by the technical term extraction unit 1016 from the literature information acquisition unit 1015. For example, when the time information acquisition unit 1017 acquires time information corresponding to the document information acquired by the document information acquisition unit 1015, the time information acquisition unit 1017 can acquire the above-described time information. In this case, if the technical power of a document is not extracted, it is not necessary to acquire the time information corresponding to the document information, or the time information corresponding to the document information is described later. Do not use it to create technical trend information.
  • the technical trend information creation unit 1018 creates technical trend information using the technical terms extracted by the technical term extraction unit 1016 and the time information acquired by the timing information acquisition unit 1017.
  • the technology trend information is information that associates a technical term with a period related to technical literature including the technical term. In other words, it is possible to know when a certain technical term was used by the technology trend information.
  • the time associated with the technical term may be the time information itself (for example, the numerical value itself such as year or month) or different from the time information (for example, corresponding to the year, month, etc.). Figure etc.). In the latter case, for example, the time associated with the technical information may be indicated using a coordinate axis indicating the time.
  • the data format is not limited.
  • the file may be a text file, a spreadsheet file, an image file, or another file.
  • This technical trend information may be, for example, an image indicating the time indicated by the time information corresponding to the document information including the technical term for each technical term.
  • the technical trend information may include the search terms accepted by the search term acceptance unit 1011.
  • the output unit 1019 outputs the technology trend information created by the technology trend information creation unit 1018.
  • this output is output to a recording medium that can be displayed on a display device (for example, a CRT or a liquid crystal display) or transmitted via a communication line to a predetermined device or printed by a printer. May be accumulated.
  • the output unit 1019 may or may not include an output device (for example, a display device or a printer). Further, the output unit 1019 may be realized by hardware, or may be realized by software such as a driver that drives these devices. For example, when the technology trend information is an image, the output unit 1019 displays and outputs the technology trend information.
  • the language used in the technical trend information creation device 1001 is not limited. For example, it may be in Japanese, English, French, Chinese, Chinese or Korean.
  • Step S1101 The search term receiving unit 1011 determines whether or not the search term is received. If the search term is accepted, the process proceeds to step S1102, and if not, the process of step S1101 is repeated until the search term is accepted.
  • the search unit 1013 searches the document information storage unit 1012 with the search terms received by the search term receiving unit 1011. For example, the search unit 1013 may acquire a document ID for identifying the searched document information.
  • Step S1103 The identifying unit 1014 identifies the document information that has the same field as the document information searched by the search unit 1013. Details of this process will be described later.
  • Step S 1104 The document information acquisition unit 1015 displays all or part of the document information searched by the search unit 1013 and all or part of the document information specified by the specifying unit 1014. Get from. Details of this process will be described later.
  • Step S1105 The technical term extraction unit 1016 extracts technical terms from all or part of the document information acquired by the document information acquisition unit 1015. Details of this process will be described later.
  • Step S1106 The time information acquisition unit 1017 acquires the time information corresponding to the document information including the technical term extracted by the technical term extraction unit 1016 from the document information storage unit 1012.
  • Step S1107 The technical trend information creation unit 1018 creates technical trend information using the technical terms extracted by the technical term extraction unit 1016 and the timing information acquired by the timing information acquisition unit 1017. .
  • Step S 1108 The output unit 1019 outputs the technical trend information created by the technical trend information creation unit 1018. Then, the process returns to step S1101. If the technology trend information can be output continuously as in the case of display output, the output may be terminated after a certain period of time, or an instruction to terminate, interrupt processing, etc. The output may be terminated with the receipt of the message as a trigger.
  • step S1102 In the search process (step S1102), the literature information acquisition process (step S1104), the technical term extraction process (step S1105), the time information acquisition process (step S1106), etc. in the flowchart of FIG. If nothing is searched for or no document information is obtained, it is possible to return to step S1101 as an error.
  • step S1103 even if the document information is not specified, the process may return to step S1101 regardless of whether the process proceeds to step S1104. In the present embodiment, the former case will be described.
  • the order of processing in the flowchart of Fig. 17 is arbitrary to some extent.
  • time The process of acquiring information may be performed before the process of acquiring the document information or the process of extracting the technical terms as long as it is after the process of specifying the document information.
  • step S1103 detailed processing of the document information identification processing (step S1103) in the flowchart of FIG. 17 will be described using the flowchart of FIG.
  • steps S1103 there are various methods for identifying document information.
  • a method for identifying document information using cited document identification information and a cited format will be described.
  • Step S 1201 The identifying unit 1014 sets the counter i to 1.
  • Step S1202 The identifying unit 1014 acquires all the cited document identification information and the cited information corresponding to each cited document identification information from the i-th document information searched by the search unit 1013. As described above, the cited document identification information may or may not be included in the cited information. Also, in this step, if the i th document information searched by the search unit 1013 does not contain any cited document identification information, the process proceeds to step S 1203 without obtaining the cited document identification information. .
  • Step S1203 The identifying unit 1014 sets the counter j to 1.
  • Step S1204 The identifying unit 1014 determines whether or not the j-th cited document identification information acquired from the searched i-th document information exists. If it exists, go to step S 1205; if not, go to step S 1208.
  • Step S1205 The identifying unit 1014 determines whether or not the citation form power S type C of the citation information corresponding to the j-th citation identification information. If it is type C, go to step S1206, and if not, go to step S1207.
  • Step S1206 The identifying unit 1014 temporarily stores the j-th cited document identification information in a recording medium (not shown). The document information identified by the temporarily stored cited document identification information becomes the identified document information.
  • Step S1207 The identifying unit 1014 increments the counter j by 1. Then, return to step S1204.
  • Step S1208 The identifying unit 1014 increments the counter i by 1.
  • Step S1209 The identifying unit 1014 determines whether or not the i-th document information searched by the search unit 1013 exists. If it exists, return to step S1202. If not, the series of processes for specifying the document information ends, and the process proceeds to step S1104.
  • step S1104 In the flowchart of FIG. 17 will be described with reference to the flowchart of FIG. 17
  • Step S 1301 The document information acquisition unit 1015 acquires all or part of each document information indicated by the search result by the search unit 1013. Details of this processing will be described later.
  • Step S1302 The document information acquiring unit 1015 determines whether the document information is identified by the specifying unit 1014. If the document information is specified, the process proceeds to step S1303. If not, the series of processes for acquiring the document information ends, and the process proceeds to step S1105.
  • Step S 1303 The document information acquisition unit 1015 acquires all or part of the document information specified by the specification unit 1014. Details of this process will be described later. When this process ends, the series of processes for obtaining the document information ends, and the process proceeds to step S1105.
  • Step S1401 The literature information acquisition unit 1015 sets the counter i to 1.
  • Step S1402 The document information acquisition unit 1015 acquires all or part of the document information indicated by the i-th search result.
  • Step S 1403 The document information acquisition unit 1015 temporarily stores all or part of the document information acquired in step S 1402 on a recording medium, not shown.
  • This recording medium may be included in other constituent elements that the document information acquisition unit 1015 may have.
  • Step S1404 The literature information acquisition unit 1015 increments the counter i by 1.
  • Step S 1405 The document information acquisition unit 1015 determines whether or not the i-th search result exists. If it exists, the process returns to step S 1402. If not, the series of processes for acquiring all or part of each document information indicated by the search result ends, and the process proceeds to step S 1302.
  • Step S1501 The literature information acquisition unit 1015 sets the counter i to 1.
  • Step S 1502 The document information acquiring unit 1015 determines whether all or a part of the i-th document information specified by the specifying unit 1014 has not yet been acquired. If it has not been acquired, the process proceeds to step S 1503, and if it has been acquired, the process proceeds to step S 1505. This determination can be made, for example, by referring to a recording medium (not shown) in which the document information acquired by the document information acquisition unit 1015 in step S1403 is temporarily stored.
  • Step S 1503 The document information acquisition unit 1015 acquires all or part of the i-th document information specified by the specifying unit 1014.
  • Step S1504 The document information acquisition unit 1015 temporarily stores all or part of the document information acquired in Step S1503 on a recording medium, not shown.
  • This recording medium may be included in other constituent elements that the document information acquisition unit 1015 may have.
  • Step S1505 The literature information acquisition unit 1015 increments the counter i by 1.
  • Step S 1506 The document information acquiring unit 1015 determines whether or not the i-th document information specified by the specifying unit 1014 exists. If it exists, the process returns to step S1502. If not, the series of processes for acquiring the specified document information ends, and the process proceeds to step S1105.
  • the document information acquisition unit 1015 Assuming that part of the literature information “sentence extraction using SVM” is acquired, the name of the main topic “sentence extraction” and the technical term “SVM ⁇ ” in that field are used via the clue phrase “using”. It can be seen that they are associated with each other. Also, for example, if the main topic name power ⁇ caption generation ", and the document information acquisition unit 1015 acquires a part of the document information" Short sentence for subtitle generation ", the main topic name” It can be seen that “caption generation” is associated with the technical term “sentence shortening” in the field via the clue phrase “for”. Even if all or part of the document information acquired by the document information acquisition unit 1015 does not include the name of the main topic, it is possible to specify technical terms by using hand-powered phrases. It is thought that.
  • FIG. 22 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 1.
  • the technical term extraction unit 1016 includes a clue phrase storage unit 1021, a clue phrase search unit 1022, and a technical term extraction unit 1023. It should be noted that all or part of the document information acquired by the document information acquisition unit 1015 is stored in the document information temporary storage unit 1020.
  • the document information temporary storage means 1020 may be included in the technical term extraction unit 1016, which may be included in the document information acquisition unit 1015, or may be included in other constituent elements, for example.
  • the clue word storage means 1021 stores one or more clue words.
  • this hand-powered phrase is a hand-held phrase that is used in the extraction of technical terms, and is determined in advance so that appropriate technical terms can be extracted. It is what. Note that this clue word is information that actually indicates a hand-held word, but is simply called a clue word.
  • the clue word storage means 1021 is a predetermined recording medium. (For example, it can be realized by a semiconductor memory, a magnetic disk, an optical disk, etc.). There is no limitation on the process in which words are stored by hand in the word storage means 1021. For example, hand-powered words transmitted through a communication line or the like may be stored in the word-and-phrase storage means 1021.
  • the clue phrase storage means 1021 may store the clue word or phrase input via the input device, or the clue phrase storage means 1021 may store the clue phrase.
  • the manual action phrase search means 1022 uses the hand stored in the manual action S phrase storage means 1021 in all or part of the literature information acquired by the literature information acquisition section 1015. Search for.
  • the technical term extraction unit 1023 extracts, as the technical term, a term having a predetermined relationship with the hand-powered phrase searched by the hand-powered phrase searching unit 1022.
  • the term having a predetermined relationship with the clue word / phrase is, for example, a term before the clue word / phrase or a term after the clue word / phrase. This relationship may be determined for each attribute of the clue word, even if it is determined for each word.
  • the attribute of the clue word / phrase may be, for example, a relationship in which the clue word / phrase relates the technical term to the main topic. For example, there are “Method”, “Goal”, and the like as attributes of the clue word.
  • the technical term extraction means 1023 temporarily stores the extracted technical terms in a recording medium (not shown).
  • the technical term extraction means 1023 may extract, for example, a continuous noun phrase having a predetermined relationship with a clue word phrase as a technical term, or a candidate for a term that can be used as a technical term. May be stored in a recording medium (not shown), and a term that has a predetermined relationship with the clue word and that matches the candidate technical term may be extracted as a technical term.
  • Step S1602 The hand-powered phrase search unit 1022 stores the j-th stored in the clue phrase storage unit 1021 in all or part of the i-th document information acquired by the document information acquisition unit 1015. Search for words and phrases. If it is found, the process proceeds to step S1603. If not, the process proceeds to step S1606.
  • the technical term extraction means 1023 extracts, as a technical term, a term in which the searched hand has a predetermined relationship with the force phrase.
  • the technical term extraction means 1023 may temporarily store the extracted technical terms in a recording medium without illustration.
  • Step S1604 The technical term extraction unit 1016 increments the counter i by 1, and sets the counter j to 1.
  • Step S1605 The technical term extraction unit 1016 determines whether or not the i-th document information acquired by the document information acquisition unit 1015 exists. If it exists, the process returns to step S1 602. If not, the series of processes for extracting technical terms ends, and the process proceeds to step S1106.
  • Step S1606 The technical term extraction unit 1016 increments the counter j by 1.
  • Step S1607 The technical vocabulary extraction unit 1016 determines whether or not the hand-powered phrase storage unit 1021 includes the j-th manual phrase S phrase. If it exists, go back to step S1602, and if not, go to step S1604.
  • the document information area including the search term searched by the search unit 1013 is If at least it is acquired by the document information acquisition unit 1015, it is possible to extract the technical terms using the search terms, but the document information area including the search terms searched by the search unit 1013 Extraction of technical terms using search terms when the information acquisition unit 1015 does not acquire it or when the reference information specified by the specification unit 1014 does not include the search terms. May not be possible. It should be noted that technical terms can be extracted using search terms, and the same applies to the following patterns!
  • FIG. 24 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 2.
  • the technical term extraction unit 1016 includes clue phrase storage means 1021, clue phrase search means 1022, technical term storage means 1024, and technical term extraction means 1025.
  • the configurations and operations other than the technical term storage unit 1024 and the technical term extraction unit 1025 are the same as those in the pattern 1, and the description thereof is omitted.
  • the technical term storage means 1024 stores two or more technical terms.
  • technical terms are terms that can be technical terms in one or more research fields and can be technical terms. This terminology is actually information indicating a terminology, but is simply called a terminology.
  • This technical term may be determined by a person or may be obtained mechanically from a plurality of technical documents. In the latter case, for example, singular nouns etc. are extracted by morphological analysis of technical literature, scoring is given to indicate the possibility that each noun etc. is a technical term, and a high score is given. Single nouns may be acquired as technical terms.
  • a method for assigning a score a plurality of types such as scoring based on word-making ability and scoring based on appearance frequency are known.
  • the technical term storage means 1024 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.). Further, the process of storing the technical terms in the technical term storage means 1024 is not limited. For example, technical terms transmitted via a communication line or the like may be stored in the technical term storage means 1024 even if the technical terms are stored in the technical term storage means 1024 via the recording medium. Alternatively, the terminology input via the input device may be stored in the terminology storage means 1024.
  • the technical term extraction means 1025 uses the terminology stored in the technical term storage means 1024 as a term having a predetermined relationship with the hand-strength phrase searched by the hand-strength phrase search means 1022. If included, the technical term is extracted as a technical term. In this way, technical terms other than the technical terms stored in the technical term storage means 1024 are not extracted as technical terms. The situation of extracting non-technical terms such as “method” can be avoided.
  • Step S1701 The technical term extraction means 1025 extracts a term in which the searched hand has a predetermined relationship with the force phrase.
  • Step S1702 The technical term extraction means 1025 sets the counter k to 1.
  • Step S1703 The technical term extraction means 1025 determines whether or not the term extracted in step S1701 includes the k-th technical term stored in the technical term storage means 1024. If it is included, the process proceeds to step S1704. If not, the process proceeds to step S1705.
  • Step S1704 The technical term extraction means 1025 extracts the k-th technical term included in the term extracted in step S1701 as a technical term.
  • Step S1705 The technical term extraction means 1025 increments the counter k by 1.
  • Step S1706 The technical term extraction means 1025 determines whether or not the k-th technical term exists in the technical term storage means 1024. If it exists, the process returns to step S1703, and if not, the process proceeds to step S1604.
  • FIG. 26 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 3.
  • the technical term extraction unit 1016 in this pattern extracts technical term candidates, and the technical term candidate power also selects technical terms.
  • the technical term extraction unit 1016 includes clue word / phrase storage means 1021, hand / phrase word / phrase search means 1022, technical term candidate extraction means 1026, and technical term selection means 1027.
  • the configuration and operation other than the technical term candidate extraction means 1026 and the technical term selection means 1027 are the same as those in pattern 1 and will not be described.
  • the technical term selection unit 1027 extracts, as technical term candidates, terms having a predetermined relationship with the hand-powered phrase searched by the hand-powered phrase searching unit 1022.
  • This technical term candidate extraction method is the same as the technical term extraction method described in patterns 1 and 2, and a description thereof will be omitted.
  • the technical term selection means 1027 selects all or part of the technical term candidates extracted by the technical term candidate extraction means 1026 as technical terms. The specific selection method will be described later in patterns 4 and 5. [0264] Next, the operation of the technical term extraction unit 1016 in Pattern 3 will be described using the flowchart in FIG. The process of the flowchart of FIG. 27 corresponds to the technical term extraction process (step S1105) in the flowchart of FIG. In this flow chart, the processes other than steps S1801 and S1802 are the same as those in the flowchart of FIG.
  • the technical term candidate extraction means 1026 extracts a term in which the searched hand has a predetermined relationship with the strong phrase as a technical term candidate.
  • the technical term candidate extraction means 1026 may temporarily store the extracted technical term candidates on a recording medium, not shown.
  • Step S 1802 Technical term selection means 1027 selects a technical term from the technical term candidates selected by technical term selection means 1027.
  • the technical term selection means 1027 may temporarily store the selected technical term in a recording medium without illustration.
  • the technical term selection means 1027 stores, for example, technical terms that are stored in the literature information storage unit 1012, but are not used as the main topic even if they are in the misplaced literature information! Candidates may not be selected as technical terms, but technical terms that are used as main topics in any document information may be selected as technical terms. This is because a term selected as a technical term is considered to be treated as a main topic in any technical document and is considered to be highly likely.
  • terms indicating the main topic are also collected and stored in the recording medium, and among the technical term candidates, the recording medium is selected. May be selected as a technical term that matches one of the main topic terms, or whether a technical term candidate is used as a main topic term.
  • the determination may be made by searching a plurality of document information stored in the document information storage unit 1012. Whether or not a certain term is used as a term indicating the main topic in the literature information can be determined as described in the search processing in the search unit 1013 using the search term, for example.
  • a term is described in the literature information a predetermined number of times or more, it may be determined that the term is a main topic, or a term is defined in a plurality of predetermined areas (for example, , Title and summary, etc.), the term is considered the main topic. May be. Needless to say, the determination may be made by other methods. For example, the structure of the document information may be analyzed to determine that the term existing in the area where the main topic is described is the main topic. Further, the method of selecting technical terms by the technical term selecting means 1027 is not limited to this and may be other methods. Other methods will be described later in patterns 4 and 5.
  • the technical terms can be extracted by selecting the technical terms from the technical term candidates.
  • FIG. 28 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 4.
  • the technical term extraction unit 1016 includes a clue phrase storage unit 1021, a clue phrase search unit 1022, a technical term candidate extraction unit 1026, a technical term selection unit 1027, and a technical term suffix storage. Means 1028.
  • the configuration and operation other than the technical term suffix storage means 1028 are the same as those in the pattern 3, and the description thereof is omitted.
  • suffix storage means 1028 stores one or more suffixes that can be taken by the technical term. Suffixes stored in the technical term suffix storage means 1028 can be technical terms such as “model”, “law”, “algorithm”, and the like. This suffix is actually information indicating a suffix, but is simply called a suffix.
  • the suffix storage means 1028 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.). The technical term suffix storage means 1028 does not matter how the suffix is stored.
  • a suffix transmitted via a communication line or the like may be stored in the technical term suffix storage means 1028 even if the suffix is stored in the technical term suffix storage means 1028 via a recording medium.
  • the suffix input via the input device may be stored in the technical term suffix storage means 1028.
  • the technical term selection unit 1027 selects a technical term candidate having the suffix stored in the technical term suffix storage unit 1028 as a technical term.
  • FIG. 29 is a flowchart showing details of the technical term selection process (step S 1802) in the flowchart of FIG. 27.
  • Step S1901 The technical term selecting means 1027 sets the counter i to 1 and sets the counter j to 1.
  • Step S1902 The technical term selection means 1027 uses the j th suffix stored in the technical term suffix storage means 1028 as the i th technical term candidate extracted by the technical term candidate extraction means 1026. Judge whether you have. If so, the process proceeds to step S1903; otherwise, the process proceeds to step S1906.
  • Step S 1903 Technical term selection means 1027 selects the i-th technical term candidate as a technical term.
  • the technical term selection means 1027 may temporarily store the selected technical term on a recording medium (not shown)!
  • Step S1904 The technical term selection means 1027 increments the counter i by 1 and sets the force counter j to 1.
  • Step S1905 The technical term selecting means 1027 judges whether or not the i th technical term candidate exists. If it exists, the process returns to step S1902, and if it does not exist, the series of processes for selecting the technical term ends, and the process proceeds to step S1106.
  • Step S1906 The technical term selection means 1027 increments the counter j by 1.
  • Step S 1907) The technical term selection unit 1027 determines whether or not the j th suffix exists in the technical term suffix storage unit 1028. If it exists, go back to step S 1902, and if not, go to step S 1904.
  • the technical terms can be extracted by selecting the technical terms from the technical term candidates.
  • FIG. 30 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 5.
  • the technical term extraction unit 1016 includes a clue phrase storage unit 1021, a clue phrase search unit 1022, a technical term candidate extraction unit 1026, a technical term selection unit 1027, and a technical term suffix storage.
  • Means 1028 and literature information identification means 1029 Mah. Except as described below, the configuration and operation other than the document information specifying unit 1029 are the same as those in the pattern 4, and the description thereof is omitted.
  • the document information acquisition unit 1015 also acquires all or part of the document information specified by the document information specifying means 1029. Further, the technical term selection means 1027 stores at least one of the technical term candidates extracted from all or part of the group of literature information acquired by the literature information acquisition unit 1015 in the technical term suffix storage means 1028. If a technical term candidate is selected, a predetermined technical term candidate is selected as a technical term from technical term candidates extracted from all or part of the group of document information.
  • a group of document information refers to one document information searched by the search unit 1013 or specified by the specifying unit 1014 (this is referred to as “document information N”), This is a set of document information including one document information and one or more document information specified by the document information specifying means 1029 that the field is the same.
  • the “predetermined technical term candidate” selected by the technical term selection means 1027 as the technical term is a technical term candidate in which all or part of the literature information N is extracted.
  • the document information specifying method by the document information specifying means 1029 is the same as the document information specifying method by the specifying unit 1014.
  • the document information specifying unit 1029 may specify, for example, document information N and document information having a citation relationship.
  • the citation relationship may be a relationship in which the document information N cites other document information or a relationship in which the document information N is cited in other document information.
  • a relationship including Further, the document information that is a problem-pointing citation format in which the format for bow I in the citation relationship is a format for quoting and pointing out a problem of the technical document may be specified.
  • the force similar to the operation of the specifying unit 1014 or the operating force of the specifying unit 1 014 can be easily considered, and detailed description thereof will be omitted.
  • FIG. 31 is a flowchart showing details of the technical term selection process (step S 1802) in the flowchart of FIG.
  • the processes other than step S11001 are the same as those in the flowchart of FIG. 29, and the description thereof is omitted.
  • Step SI 1001 Technical term selection means 1027 selects technical term candidates using the literature information identified by literature information identification means 1029. Details of this processing will be described later.
  • step S11001 detailed processing of the technical term candidate selection processing (step S11001) in the flowchart of FIG. 31 will be described using the flowchart of FIG.
  • the document information specifying means 1029 specifies document information having the same field as the document information including the i-th technical term candidate. For example, when specifying document information that is cited in the document information including the i-th technical term candidate and that has the citation form power S type C, the flowchart of FIG. It can be done by the process shown in. Note that the document information specifying unit 1029, for example, if the document information temporary storage unit 1020 includes a document ID or the like for identifying the document information, the document information including the technical term candidate of the cell using the document ID or the like. Can be specified.
  • Step S11102 The document information acquiring unit 1015 acquires all or part of each document information specified by the document information specifying unit 1029. This acquisition process can be performed in the same manner as the process of step S1104, for example. When all or a part of the document information specified by the document information specifying unit 1029 has already been acquired, the document information acquisition unit 1015 acquires all or a part of the document information.
  • the document information may be acquired from a recording medium in which all or part of the document information is stored, or may be acquired from the document information storage unit 1012.
  • Step S11103 Technical term candidate extraction means 1026 extracts technical term candidates from all or part of the literature information acquired in step S11102. This extraction process can be performed, for example, in the same manner as the process before step S1802 in the flowchart of FIG.
  • Step S11104 Technical term selecting means 1027 sets counter m to 1 and sets counter n to 1.
  • Step S11105 The technical term selection means 1027 stores the mth technical term candidate extracted by the technical term candidate extraction means 1026 in step S11103 in the technical term suffix storage means 1 028. Determine if it has a suffix. And if you have If this is not the case, go to step SI 1110. If not, go to step SI1110.
  • Step S11106 Technical term selecting means 1027 increments counter m by 1.
  • Step S11107 The technical term selection unit 1027 determines whether or not the m-th suffix exists in the technical term suffix storage unit 1028. If it exists, go back to step S11105, and if not, go to step S11108.
  • Step S11108 Technical term selecting means 1027 increments counter ⁇ by 1, and sets counter m to 1.
  • Step S11109 The technical term selecting means 1027 judges whether or not the nth technical term candidate exists. If it exists, the process returns to step S11105. If it does not exist, the series of processes for selecting a technical term using the specified document information ends, and the process proceeds to step S1904.
  • Step SI 1110 Technical term selecting means 1027 selects the i-th technical term candidate as a technical term.
  • the technical term selection means 1027 may temporarily store the selected technical term on a recording medium (not shown).
  • the technical terms can be extracted by selecting the technical terms from the technical term candidates.
  • the technical term extraction unit 1016 explained the variation in which the technical term is extracted, but it goes without saying that the technical term may be extracted using a method other than the above explanation.
  • the extracted technical term is a term used in the technical literature field searched by the search unit 1013, and is preferably a technical term that is an element in the technical literature field. Street.
  • the technical information may be extracted by analyzing all or part of the literature information acquired by the literature information acquisition unit 1015. Briefly explain how to extract technical terms by analyzing all or part of the literature information.
  • the technical term extraction unit 1016 can extract the term “SVM” corresponding to the “Method” tag from this analysis result as a technical term.
  • the term sandwiched between the tags may be extracted.
  • the term sandwiched between the tag and the hand of “Goal” are extracted. You will extract the term (usually the term after the “Goal” tag) that is matched by the force phrase.
  • the document information acquisition unit 1015 acquires “Sentence shortening for subtitle generation in a news program”, as a result of the structural analysis, the subtitle “ ⁇ Generate ZGoal> Sentence shortened '', and the technical term extraction unit 1016 uses the term ⁇ caption generation '' and ⁇ Goal '' clue phrase ⁇ for '' that are interleaved between ⁇ Goal '' tags. "Sentence shortening" is extracted as a technical term. Note that structure tags other than those described in FIG. 33 may be used. However, the “Method” tag and “Goal” tag are generally used to extract technical terms. Further, the technical term extraction unit 10 16 may extract technical terms using the search terms.
  • the operation of the technology trend information creation device 1001 according to the present embodiment will be described using a specific example.
  • the technology trend information creation device 1001 is a stand-alone device.
  • the document information and the like stored in the document information storage unit 1012 is as shown in FIG.
  • a document ID for identifying a technical document, time information, and document information are associated with each other.
  • the document information “D001” or the like is data indicating each document information, and at least the title area can be specified in each document information.
  • search term “morpheme analysis” is received by the search term receiving unit 1011 (step S1101) and passed to the search unit 1013.
  • the search unit 1013 searches for literature information in which the search term “morpheme analysis” is included in the title area (step S1102).
  • the search unit 1013 obtains a document ID “00 1, 003, 010...” Corresponding to the document information in which the search term “morphological analysis” is included in the title area, and the specifying unit 1014 And the document information acquisition unit 1015.
  • the identifying unit 1014 identifies document information having the same field as the document information identified by the document ID received from the search unit 1013 (step S1103).
  • the document information cited in the type C citation format is specified in the document information identified by the document ID received by the search unit 1013. The process will be specifically described.
  • the identifying unit 1014 acquires cited document identification information and citation information from the document information identified by the first document ID “001” received from the search unit 1013 (steps S1201 and S1202). .
  • the specifying unit 1014 reads the “reference document” column of the document information from the document information storage unit 1012.
  • the bibliographic information (title, date of publication, author, publication medium, etc.) of the documents described in is acquired as cited reference identification information.
  • the bibliographic information includes, for example, 1, 2, 3 ..., 1), 2), 3) ..., [1], [2], [3] ...] [A], [B], [ C] ... etc. are added, so that index is also acquired.
  • the specifying unit 1014 performs a search using the acquired index in the document information, and specifies a sentence including the index. Then, the sentence including the index and the sentence closely related to the sentence are acquired as citation information.
  • the specifying unit 1014 performs a search using the acquired index in the document information, and specifies a sentence including the index. Then, the sentence including the index and the sentence closely related to the sentence are acquired as citation information.
  • the identification unit 1014 presumably holds the following two types of clue words on a recording medium (not shown).
  • the specifying unit 1014 sets a sentence including an index as a candidate sentence of citation information. Then, if the first sentence of the candidate sentence contains at least one of the (A) pre-extension manpower S words, the sentence one sentence before the candidate sentence is included in the candidate sentence.
  • the identification unit 1014 repeats the process of expanding the candidate sentence before the candidate sentence does not include (A) all of the preceding extension clue words.
  • the specifying unit 1014 adds (B) a post-extension hand to the next sentence after the candidate sentence (a sentence adjacent to the candidate sentence and not included in the candidate sentence).
  • the identification unit 1014 repeats the process of expanding the candidate sentence later until the sentence after the candidate sentence (B) does not include all of the phrasing words.
  • the candidate sentence at the time when the process of expanding the candidate sentence is completed becomes the citation information.
  • the identification unit 1014 acquires the citation information.
  • the (pre-extended cue word / phrase and (B) post-extended cue word / phrase are examples, and if the sentence containing the index is closely related to the sentence, the other hand can be extracted. In this example, we focused on the sentence that contains the index, but in the technical literature that directly refers to the cited document by the cited document identification information, The sentence containing the document identification information is used instead of the sentence containing the index.
  • the identifying unit 1014 determines whether or not the citation form power type C is obtained using the acquired citation information.
  • the specifying unit 1014 holds the information that associates the type C clue phrase shown in FIG. 35 with the position of the hand clue phrase on the recording medium, not shown.
  • the index means the sentence before the sentence containing the cited document identification information. If a type C cue word is included in the sentence indicated by the position of the corresponding cue word, the citation form of the citation information is determined to be type C. Specifically, in the citation information, any sentence from the next sentence (first sentence) to the third sentence after the sentence containing the index or cited document identification information contains “However”. In this case, it is determined that the citation information format for the bow I is type C (step S1205).
  • the identifying unit 1014 temporarily stores the cited document identification information corresponding to the cited information of the citation format determined to be type C (step S1206). For example, cited document identification information corresponding to an index included in citation information whose citation format is type C is temporarily stored.
  • the identifying unit 1014 obtains citation information and cited document identification information for all document IDs received from the search unit 1013, and temporarily stores cited document identification information whose citation format is C type ( Steps S1201 to S1209).
  • the temporarily stored cited document identification information is passed to the document information acquisition unit 1015.
  • the document information acquisition unit 1015 acquires the title of the document information identified by the document ID “001, 003, 010...” Received from the search unit 1013 from the document information storage unit 1012 (Step S1301). In addition, the document information acquisition unit 1015 searches the document information storage unit 1012 using the cited document identification information also received by the specifying unit 1014, and the citation is obtained. When the title of the document information identified by the document identification information has not been acquired, the title of the document information is acquired (steps S1302, S1303, S1501 to S1506). In this way, the document information acquisition unit 1015 acquires a part of the document information (step S 1104).
  • FIG. 36 is a diagram showing an example of a part (title) of document information acquired in this way.
  • the technical term extraction unit 1016 will be described.
  • the information shown in FIG. 36 is stored in the document information temporary storage means 1020.
  • the clue word phrase storage means 1021 stores the clue word phrase shown in FIG.
  • the “$” is a symbol indicating the position of a technical term to be extracted. For example, if the title is acquired by the literature information acquisition unit 1015, the hand is a vocabulary phrase “by”, and the continuous noun phrase before the word “by” is extracted as a technical term. To do.
  • the first title “morphological analysis using context tree” shown in FIG. 36 does not include the clue phrases “by” and “based on”. It is determined that these clue words are not included in the second title (steps S1601, S1602, S1606, S1607).
  • the clue phrase search means 1022 searches for the clue phrase “using” in the first title “morphological analysis using context tree”. In this case, since “it used” is included in the title (step S 1602), the clue phrase “used” is searched, and the clue phrase search means 1022 uses the clue phrase “ To the technical term extraction means 1023 to extract the continuous noun phrase before “using”.
  • the technical term extraction means 1023 performs morphological analysis on the title “morphological analysis using context tree”, and extracts a continuous noun phrase “context tree” before the clue word “using” as a technical term (In step S1603), the technical term “context tree” and the corresponding document ID “001” are stored in a recording medium (not shown). In this way, technical terms are extracted (step S110 5).
  • FIG. 38 is a diagram showing an example of information associating technical terms extracted in this way with document IDs. Note that the technical term extraction unit 1016 passes the document ID of the information shown in FIG. 38 to the time information acquisition unit 1017. The technical term extraction unit 1016 passes the information shown in FIG. 38 to the technical trend information creation unit 1018.
  • the time information acquisition unit 1017 Upon receiving the document ID, the time information acquisition unit 1017 searches the document information storage unit 1012 using the document ID as a search key, and acquires the time information corresponding to the document ID (step S 1106). ).
  • FIG. 39 is a diagram showing an example of information associating the time information thus obtained with the document ID.
  • the time information acquisition unit 1017 shall pass the information shown in FIG. 39 to the technology trend information creation unit 1018.
  • the technical trend information creation unit 1018 receives the information shown in Fig. 38 from the technical term extraction unit 1016, and receives the information shown in Fig. 39 from the timing information acquisition unit 1017, the timing information is obtained for each technical term. Group. Each technical term is written vertically, the time is written on the horizontal axis, a circle is placed at the corresponding time for each technical term, and a line connecting the rightmost circle and the leftmost circle is written. An image is created (step S 1107). The image shall also contain the search terms received via routes not shown. In order to generate an image of technology trend information, the technology trend information creation unit 1018 holds a template or the like in a recording medium (not shown) and generates an image of technology trend information using the template or the like. Good.
  • the output unit 1019 displays an image of the technology trend information created by the technology trend information creation unit 1018 on the display (step S1108).
  • FIG. 40 is a diagram showing an example of the image of the technical trend information displayed on the display as described above. In this way, the user can know the technical trend (technical flow) related to morphological analysis.
  • the technical trend information may not be an image.
  • the information shown in FIG. 41 may be the information shown in FIG. 38 combined with the information shown in FIG. 39 using the document ID.
  • the technology trend information shown in FIG. 41 may be spreadsheet data, for example.
  • the technical term storage means 1024 stores the technical terms shown in FIG. 42, and a term having a predetermined relationship with the manual skill S phrase searched by the manual skill S phrase search means 1022 is “Japanese. It is assumed that it is “Konos” or “Proposed method”. In this case, since the “Japanese corpus” includes the “corpus” stored in the technical term storage means 1024, the technical term extraction means 1025 extracts “corpus” as the technical term. To do something.
  • the technical term extracting means 1025 uses the “proposed method” that is a non-technical term as a technical term. It will not be extracted. By doing so, the technical term extraction means 1025 extracts an appropriate “corpus” as a technical term that does not extract “Japanese corpus” as a technical term. In addition, it is possible to avoid the situation where the technical term extraction means 1025 extracts general nouns and non-technical terms.
  • the technical trend information creation apparatus 1001 extracts technical terms from the literature information, and acquires time information corresponding to the literature information from which the technical terms are extracted.
  • the technology trend information can be created using the extracted technical terms and the acquired time information.
  • the document information acquisition unit 1015 acquires all or part of the document information searched by the search unit 1013.
  • the document information acquisition unit 1015 may acquire all or a part of at least some document information among the plurality of document information stored in the document information storage unit 1012.
  • the literature information acquisition unit 1015 may select all of the literature information stored in the literature information storage unit 1012 or randomly. A predetermined number of documents selected in the above may be acquired.
  • the technology trend information creation device 1001 does not need to include the search term reception unit 1011 and the search unit 1013.
  • the specifying unit 1014 specifies the document information having the same field as the document information searched by the searching unit 1013
  • the document information acquiring unit 1015 specifies the document specified by the specifying unit 1014.
  • the document information acquisition unit 1015 may not acquire the document information specified by the specifying unit 1014.
  • the technology trend information creation device 1001 may not include the specifying unit 1014.
  • the document information when the document information is specified by the specifying unit 1014 and the cited document identification information or the cited information is not used, the document information may not include the cited document identification information or the cited information.
  • a technical trend information creation device according to Embodiment 5 of the present invention will be described with reference to the drawings.
  • the search terms accepted by the search term acceptance unit are selected.
  • FIG. 43 is a block diagram showing a configuration of the technical trend information creation device 1002 according to the present embodiment.
  • the technology trend information creation device 1002 includes a search term acceptance unit 1011, a document information storage unit 1012, a search unit 1013, a specification unit 1014, a document information acquisition unit 1015, a technology Term extraction unit 1016, time information acquisition unit 1017, technology trend information creation unit 1018, output unit 1019, suffix storage unit 1031, technical term extraction unit 1032, search term storage unit 1033, and search term storage A section 1034 and a search term selection receiving section 1035 are provided.
  • the configuration and operation other than the suffix storage unit 1031, the specialized term extraction unit 1032, the search term storage unit 1033, the search term storage unit 1034, and the search term selection reception unit 1035 newly added in the present embodiment are searched.
  • the term accepting unit 1011 is the same as in the fourth embodiment except that it accepts the search term indicated by the selection accepted by the search term selection accepting unit 1035, and the description thereof is omitted.
  • the suffix storage unit 1031 stores a predetermined suffix. This suffix is used when selecting a search term from the technical terms extracted by the professional term extraction unit 1032 described later, and is, for example, “system”, “method”, “data”, and the like. It goes without saying that other appropriate suffixes may be used.
  • the suffix stored in the suffix storage unit 1031 is actually information indicating a suffix, but is simply referred to as a suffix. Also, contact
  • the acronym storage unit 1031 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.). Further, the process of storing the suffix in the suffix storage unit 1031 does not matter.
  • a suffix transmitted via a communication line or the like may be stored in the suffix storage unit 1031 even if the suffix is stored in the suffix storage unit 1031 via a recording medium.
  • the suffix input via the input device may be stored in the suffix storage unit 1031.
  • the technical term extraction unit 1032 extracts technical terms from the literature information stored in the literature information storage unit 1012.
  • the terminology is a technical term in one or more research fields. This terminology is actually information indicating a terminology, but is simply called a terminology.
  • the technical term extraction unit 1032 extracts single nouns, for example, by performing morphological analysis on the literature information, performs scoring to indicate the possibility of being a technical term for each single noun, etc. You can get a single noun and other words as technical terms.
  • a method for assigning a score a plurality of types such as scoring based on word-making ability and scoring based on appearance frequency are known. Other methods may be used.
  • a method for extracting technical terms is disclosed in a plurality of documents shown in the fourth embodiment and the like, and its conventional power is also known.
  • the search term storage unit 1033 stores a term in which the technical term extracted by the technical term extraction unit 1032 is stored in the suffix storage unit 1031 and a suffix is added in the document information storage unit 1012.
  • the technical term is stored in the search term storage unit 1034 as a search term.
  • the suffix “system” is stored in the suffix storage unit 1031 and the technical term extraction unit 1032 extracts “morpheme” and “morpheme analysis”, the “morpheme system” is included in the literature information. If the “morpheme analysis system” is included in the document information, “morpheme analysis” is stored as a search term, but “morpheme” is not stored.
  • the search term storage unit 1034 stores two or more search terms. This search term is stored by the search term storage unit 1033.
  • the search term storage unit 1034 Recording medium (for example, semiconductor memory, magnetic disk, optical disk, etc.).
  • the search term selection accepting unit 1035 accepts selection of one of the two or more search terms stored in the search term storage unit 1034.
  • two or more search terms stored in the search term storage unit 1034 may or may not be output.
  • a search term may be displayed on a display or the like, and the user may click a desired search term by operating a mouse or the like, and the search term selection receiving unit 1035 may receive the operation.
  • the user has a document or the like on which two or more search terms stored in the search term storage unit 1034 and an identification symbol corresponding to each search term are displayed.
  • the user may input an identification symbol corresponding to a desired search term by operating a keyboard or the like, and the search term selection receiving unit 1035 may receive the input.
  • the search term receiving unit 1011 receives the search term indicated by the selection received by the search term selection receiving unit 1035 as described above.
  • the search term selection accepting unit 1035 accepts information input from an input device (for example, a keyboard, mouse, touch panel, etc.), for example, and is transmitted via a wired or wireless communication line. Received information may be received. Note that the search word selection receiving unit 1035 may or may not include a device (for example, a modem or a network card) for receiving. The search term selection accepting unit 1035 may be realized by hardware or software such as a driver for driving a predetermined device.
  • FIG. 44 is a flowchart showing a process replacing the search term acceptance process (step S1101) in the flowchart of FIG. That is, in the technology trend information creation device 1002 according to the present embodiment, the processing of steps S 11201 to S11203 shown in FIG. 44 may be performed instead of the processing of step S1101! /.
  • Step S11201 Search term selection accepting unit 1035 accepts selection of a search term Judge whether it is power. If the selection is accepted, the process proceeds to step S11202. If not, the process of step S11201 is repeated until the selection is accepted.
  • Search term selection receiving unit 1035 reads a search term indicated by the selection received by search term selection receiving unit 1035 from two or more search terms stored in search term storage unit 1034.
  • Step S11203 The search term selection receiving unit 1035 inputs the search term read from the search term storage unit 1034 in Step S11202 to the search term receiving unit 1011. As a result, the search term is accepted by the search term acceptance unit 1011. Then go to step S1102.
  • search term selection receiving unit 1035 reads the search term stored in the search term storage unit 1034 and inputs it to the search term reception unit 1011 has been described.
  • the search term reception unit 1011 reads the search term indicated by the selection received by the search term selection reception unit 1035 from the search term storage unit 1034.
  • Step S11301 The technical term extraction unit 1032 extracts technical terms from the literature information stored in the literature information storage unit 1012. The method for extracting this technical term is as described above, and a detailed description thereof is omitted. Note that the extracted technical terms are not shown.
  • V may be temporarily stored in the recording medium!
  • Step S11302 Search term storage section 1033 sets counter i to 1 and counter j to 1.
  • Step S11303 The search term storage unit 1033 adds a term obtained by adding the jth suffix stored in the suffix storage unit 1031 to the i-th technical term extracted by the technical term extraction unit 1032 and the literature information storage unit. A search is performed in 1012 to determine whether the term is included in the document information stored in the document information storage unit 1012. If it is included, go to step S11304. If not, go to step S11307.
  • Step S11304 The search term storage unit 1033 uses the i-th technical term as a search term. And stored in the search term storage unit 1034.
  • Step S11305 The search term storage unit 1033 increments the counter i by 1, and sets the force counter j to 1.
  • Step S11306 The search term accumulation unit 1033 determines whether or not the i-th term is extracted by the technical term extraction unit 1032. If it exists, the process returns to step S 11303; otherwise, the series of processes for accumulating search terms ends.
  • Step S11307 The search term storage unit 1033 increments the counter j by 1.
  • Step S11308 The search term storage unit 1033 determines whether or not the j-th suffix is stored in the suffix storage unit 1031. If it is stored, go back to step S11303, and if not, go to step S11305.
  • the technical trend information creation apparatus 1002 extracts technical terms, selects search terms from the technical terms, and accumulates them to extract the technical terms and search terms. Appropriate search terms indicating the research field can be received by the search term receiving unit 1011 by appropriately selecting the items. As a result, appropriate technology trend information can be created.
  • the search term storage unit 1034 stores the search terms stored in the search term storage unit 1034.
  • the search term storage unit 1034 stores the search terms. Terminology may be accumulated by other methods.
  • the user or the like may select an appropriate search term and the search term may be stored in the search term storage unit 1034.
  • the process in which the search term is stored in the search term storage unit 1034 is not limited.
  • a search term transmitted via a communication line or the like may be stored in the search term storage unit 1034 even if the search term is stored in the search term storage unit 1034 via a recording medium.
  • the search term input via the input device may be stored in the search term storage unit 1034.
  • the technology trend information creation device 1002 may not include the suffix storage unit 1031, the technical term extraction unit 1032, and the search term storage unit 1033.
  • Embodiment 6 A technical trend information creation device according to Embodiment 6 of the present invention will be described with reference to the drawings.
  • the technical trend information creating apparatus displays information regarding the document information in the displayed image of the technical trend information according to the selection of the document graphic corresponding to the document information.
  • FIG. 46 is a block diagram showing a configuration of the technical trend information creation device 1003 according to this embodiment.
  • the technology trend information creation device 1003 includes a search term acceptance unit 1011, a document information storage unit 1012, a search unit 1013, a specification unit 1014, a document information acquisition unit 1015, a technology A term extraction unit 1016, a time information acquisition unit 1017, a technology trend information creation unit 1018, an output unit 1019, a graphic selection reception unit 1041, and a display information acquisition unit 1042 are provided.
  • the technology trend information creation unit 1018 displays the display information acquired by the display information acquisition unit 1042. Except for creating the technology trend information, this is the same as in the fourth embodiment, and the explanation is omitted.
  • the technical trend information is a time when the document graphic corresponding to the document information including the technical term is indicated by the timing information corresponding to the document information. It is assumed that the image is displayed accordingly.
  • the shape of the document figure is not important.
  • the document figure may be a circle, a triangle, a rectangle, or the like, or another shape.
  • the figure selection receiving unit 1041 receives selection of a document figure in the technology trend information.
  • This selection may be accepted, for example, using a GUI (Graphical User Interface) such as bringing the pointer over a reference figure by operating the mouse or the like (on mouse).
  • GUI Graphic User Interface
  • each document graphic is associated with identification information (for example, a number or alphabet) for identifying the document graphic, and the user can operate the keyboard by operating the keyboard. Selection may be accepted such as inputting identification information.
  • the graphic selection accepting unit 1041 accepts, for example, an input device (eg, a keyboard, a mouse, a touch panel, etc.) and transmits it via a wired or wireless communication line that accepts input information. Received information may be received.
  • an input device eg, a keyboard, a mouse, a touch panel, etc.
  • Received information may be received.
  • figure selection reception The attachment unit 1041 may or may not include a device (for example, a modem or a network card) for reception.
  • the graphic selection receiving unit 1041 may be realized by hardware, or may be realized by software such as a driver that drives a predetermined device.
  • Display information acquisition section 1042 acquires display information from document information storage section 1012.
  • the display information is information related to the document information corresponding to the document graphic accepted by the graphic selection receiving unit 1041, and is information displayed on the image of the technical trend information.
  • Information related to document information is, for example, bibliographic information of document information. More specifically, it may be the title of the document information, the author, the publication year of the document, or the like.
  • the technical trend information creation unit 1018 creates technical trend information for displaying the display information acquired by the display information acquisition unit 1042.
  • the display information may be displayed in the vicinity of the reference graphic that the graphic selection receiving unit 1041 has received the selection, or if the display information display area is set, the display information is displayed in that area. May be.
  • FIG. 47 is a diagram showing a process executed after the process of the technical trend information output process (step S 1108) in the flowchart of FIG. That is, in the technology trend information creation device 1003 according to the present embodiment, the processing in steps S11401 to S11406 shown in FIG. 47 may be performed after the processing in step S1108.
  • Step S11401 The figure selection receiving unit 1041 determines whether or not the selection of a document figure has been received. If the selection of the document figure is accepted, the process proceeds to step S11402, and if not, the process proceeds to step S11405.
  • the display information acquisition unit 1042 acquires display information, which is information related to the reference graphic accepted by the graphic selection receiving unit 1041, from the reference information storage unit 1012.
  • the technology trend information creation unit 1018 creates technology trend information including the display information acquired by the display information acquisition unit 1042.
  • Step S11404 The output unit 1019 is a display created by the technology trend information creation unit 1018. Outputs technical trend information including information. Then, the process returns to step S11401.
  • Step S11405 The output unit 1019 determines whether or not to end the output of the technology trend information. For example, when the technical trend information is output and a predetermined time elapses, an instruction to end the output of the technical trend information is received, or an instruction to output other information is received The output unit 1019 may determine to end the output of the technology trend information. If the output of the technology trend information is to end, the process proceeds to step S11406. If not, the process returns to step S11401.
  • Step S11406 The output unit 1019 ends the output of the technology trend information. Then, the process returns to step S1101.
  • FIG. 40 it is assumed that an image of the technology trend information is displayed on the display.
  • the mouse pointer is placed on the leftmost circle in the display (circle) corresponding to the minimum cost method.
  • the figure selection receiving unit 1041 determines that the selection of the document figure has been received (step S11401).
  • the graphic selection receiving unit 1041 accesses the technical trend information creating unit 1018 to acquire the document ID “003” for identifying the document information corresponding to the document pattern at the position of the mouse pointer, and obtains the display information acquiring unit. Pass to 1042.
  • the display information acquisition unit 1042 acquires the title of the document information identified by the document ID “003” received from the figure selection reception unit 1041 as display information from the document information storage unit 1012 (step S11402).
  • the display information is passed to the technical trend information creation unit 1018.
  • the technology trend information creation unit 1018 creates the technology trend information so that the display information received from the display information acquisition unit 1042 is displayed in association with the position of the mouse pointer at that time (step S 11403). Note that the technology trend information creation unit 1018 may receive the position of the mouse pointer from the figure selection reception unit 1041 or other component force.
  • the output unit 1019 outputs the technology trend information including the display information created as described above. As a result, the technology trend information shown in FIG. 48 is displayed on the display (step S 11404). [0348] When the user performs an operation to close the image of the technology trend information, the output unit 1019 determines that the output of the technology trend information is finished (step S11405), and the The output is ended (step S11406).
  • the display information may include the author's name and the announcement time.
  • the announcement time may be time information.
  • the technical trend information creation device 1003 when a document graphic in the image of the technical trend information is selected, display information that is information related to the document information corresponding to the document graphic Is acquired, and an image of technical trend information including the display information is created and output. Therefore, for example, it becomes possible for a user to easily know predetermined information about document information including technical terms displayed in the technical trend information image by looking at the technical trend information image. User convenience has been improved.
  • the technical term extraction unit 1016 applies the “Method” tag and the “Goal” tag to at least a part of the document information acquired by the document information acquisition unit 1015 in the same manner as described above. Is granted.
  • “Re S trict” tags that correspond to clue words related to limitations such as “in” and “related”, and “and”, “or”, and other connections related to the corresponding words “ Add “Conj” tag and so on.
  • tags other than the “Method” tag, “Goal” tag, and “Head” tag described later are referred to as “Other” tags.
  • the case where the document information acquisition unit 1015 acquires the title of the technical document as part of the document information will be described.
  • a “Head” tag indicating the subject of the document information is also added.
  • the “Head” tag is attached to a noun phrase to which no “Method” tag or “Goal” tag is attached.
  • a “Head” tag is added to the noun phrase at the end of the acquired title. It becomes.
  • the technical term extraction unit 1016 extracts a term having a “Method” tag as a technical term from a title to which the “Head” tag is added to the search term received by the search term receiving unit 1011.
  • the technical term extraction unit 1016 extracts a term that is given a “Heal” tag from a title that is assigned a “Goal” tag to the search term accepted by the search term acceptance unit 1011 as a technical term. By doing so, it is possible to extract technical terms indicating elemental technologies for the search terms.
  • Method 1 First, Method 1 will be described.
  • terms that are inappropriate as the subject for example, “research” “examination” “development” “impact” “analysis” “consideration” “evaluation” “consideration” “experimental research” “examination” “application”
  • a list of “effects”, “suggestions”, etc.) (herein called “unnecessary word list”) is prepared.
  • a method of creating this unnecessary word list will be described later.
  • the technical term extraction unit 1016 adds a “Head” tag, a “Method” tag, and the like to the title acquired by the document information acquisition unit 1015. Then, it is determined whether the term with the “Head” tag is included in the unnecessary word list.
  • the process of assigning the “Head” tag ends.
  • the 3 ⁇ 4 ⁇ (1) tag is removed from the term, and a noun phrase existing before the term is deleted. If the “Method” tag is added with the “Goal” tag, the “Head” tag is added to the noun phrase, and the term with the “Head” tag is added to the unnecessary word list.
  • Method 2 data (teacher data) with appropriate “Head” tags is prepared and machine learning is performed. In this case, machine learning is similarly performed for “Method” tags and “Goal” tags other than the “Head” tag. In addition, by including information on whether or not each word is included in the unnecessary word list in the teacher data, the “Head” tag is not added to the term included in the unnecessary word list. Conceivable. For example, SVM (support vector machine) may be used as a machine learning tool. By performing such machine learning, it is considered that inappropriate “Head” tags are not assigned.
  • SVM support vector machine
  • Head tag When the “Head” tag is added formally, the “Head” tag is added to inappropriate terms such as “Experiment” and “Consideration”, and the appropriate “Morphological analysis” is added.
  • the term “Hea d” tag is also attached to the term. However, only the papers in the natural language processing field are given the “Head” tag for terms such as “morphological analysis”, but the “Head” tag is given to terms such as “experiment” and “one consideration”. It is believed that it will cover a wide range of technical fields. Therefore, after formally assigning the “Head” tag and sorting the terms with the “Head” tag in order of frequency, unnecessary words such as “experiment” and “one consideration” are ranked higher. Become. An unnecessary word list can be created by manually excluding technical terms from unnecessary words ranked higher.
  • FIG. 49 shows the results when tags are formally added
  • FIG. 50 shows the results when Method 1 is used
  • FIG. 51 shows the results when Method 2 is used.
  • Precision is a value indicating accuracy (accuracy rate) and serves as an index of accuracy. It is calculated by the following formula.
  • Recall is a value indicating the recall, and serves as an index of completeness. It can be calculated by the following formula.
  • the F value is a harmonic average of Precision and Recall, and is obtained by the following equation.
  • F value 2 Recall water Precision / (Recall + Precision;
  • FIG. 52 is a block diagram showing a configuration of a technology trend information creation device 1004 that creates technology trend information in which technical terms are grouped for each classification.
  • the technology trend information creation device 1004 further includes a classification result information storage unit 1051, except that the technology trend information is created using the classification result information stored in the classification result information storage unit 10 51, This is the same as the technology trend information creation device 1001 of the fourth embodiment.
  • classification result Information is information indicating the result of classification of technical terms.
  • This technical term is a technical term that the technical term extraction unit 1016 may extract.
  • Fig. 53 in the classification result information, technical terms are grouped by classification. In FIG. 53, a classification name such as “algorithm” is given to the classification, but the classification name is given!
  • the classification result information may be created manually or automatically.
  • the classification result information storage unit 1051 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, or the like).
  • the process of storing the classification result information in the classification result information storage unit 1051 is not limited.
  • classification result information transmitted via a communication line or the like may be stored in the classification result information storage unit 1051 even if the classification result information is stored in the classification result information storage unit 1051 via the recording medium.
  • the classification result information input via the input device may be stored in the classification result information storage unit 1 051! /.
  • the technical trend information creation unit 1018 creates technical trend information in which technical terms are classified using the classification result information stored in the classification result information storage unit 1051.
  • the technical trend information is the same as in the fourth embodiment, etc., except that technical terms are classified.
  • the technology trend information creation unit 1018 refers to the classification result information, and classifies the technical terms extracted by the technical term extraction unit 1016 for each classification. For example, when the technical terms “minimum cost”, “neural net”, “HMM”, “tagged corpus”, and “untagged cornos” are extracted, the technical trend information creation unit 1018 displays the classification results shown in FIG. By referring to the information, these technical terms are divided into the classification “algorithm” and the classification “data”.
  • FIG. 54 is a diagram showing an example of the technology trend information created as described above.
  • the technical terms are classified for each category, the user can easily grasp the technical trend for each category. This is particularly effective when a large number of technical terms are extracted.
  • each process or each function may be realized by centralized processing by a single device or a single system, or a plurality of devices. Alternatively, it may be realized by distributed processing by a plurality of systems.
  • the related term acquisition device and the technical trend information creation device may be a stand-alone device or may be a server device constituting a server 'client system.
  • the related term acquisition device may receive the term information transmitted from the client, or transmit the related term information to the client.
  • each component may be configured by dedicated hardware, or a component that can be realized by software may be realized by executing a program.
  • each component can be realized by a program execution unit such as a CPU reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
  • the software that realizes the related term acquisition apparatus according to Embodiments 1 to 3 described above is the following program. That is, this program has a term information receiving step for receiving term information, which is information indicating terms, in a computer, and a term indicated by the term information received in the term information receiving step in all or part of the document information.
  • the document information is searched from two or more document information stored in the document information storage unit in which two or more document information that is information indicating the document is stored, and all or all of the searched document information is searched.
  • the document information for the bow I is stored in the document information storage unit and is searched for two or more document information, and corresponds to a predetermined item that is a part of the retrieved cited document information.
  • You A related term information acquisition step for acquiring related term information, which is information indicating related terms, and a related term information output step for outputting related term information acquired in the related term information acquisition step. is there.
  • the software that realizes the technical trend information creation device in the above embodiment is the following program.
  • this program stores, in a computer, a plurality of document information that is information indicating at least a part of the technical document, and a document that stores at least time information that is information indicating a time related to the technical document indicated by the document information Of the plurality of document information stored in the information storage unit, at least a part of the document information acquisition step for acquiring all or a part of the document information, and all or one of the document information acquired at the document information acquisition step.
  • the technical term extracting step for extracting the technical term the timing information acquiring step for acquiring the timing information corresponding to the document information including the technical term extracted in the technical term extracting step, from the document information storage unit, Using the technical terms extracted in the technical term extraction step and the time information acquired in the time information acquisition step, the technical terms and And executing a technology trend information creating step for creating technology trend information, which is information associating a period related to a technical document including the technical term, and an output step for outputting the technology trend information created in the technology trend information creating step. Is for.
  • a search term acceptance step for accepting a search term in a computer a search step for searching the literature information storage unit using the search term accepted in the search term acceptance step
  • the document information acquisition step may acquire all or part of the retrieved document information.
  • this program is recorded on a predetermined recording medium (for example, an optical disk such as a CD-ROM, a magnetic disk, a semiconductor memory, etc.) that can be executed by being downloaded by force such as a server.
  • the program may be executed by being read.
  • the computer that executes this program may be singular or plural. That is, centralized processing or distributed processing may be performed.
  • FIG. 55 is a schematic diagram showing an example of the appearance of a computer that executes the program and realizes the related term acquisition device and the technical trend information creation device according to the embodiment.
  • the above-described embodiment is realized by computer hardware and a computer program executed on the computer hardware.
  • the computer system 100 includes a computer 101 including a CD-ROM (Compact Disk Read Only Memory) drive 105, an FD (Flexible Disk) drive 106, a keyboard 102, a mouse 103, and a monitor. 104.
  • CD-ROM Compact Disk Read Only Memory
  • FD Flexible Disk
  • FIG. 56 is a diagram showing a computer system.
  • a computer 101 includes a CD-ROM drive 105 and an FD drive 106, a CPU (Central Processing Unit) 111, and a ROM (Read Only Memory) 112 for storing a program such as a bootup program.
  • the CPU 111 is connected to the CPU 111 to temporarily store the instructions of the application program, and to provide a temporary storage space.
  • a bus 115 for mutually connecting the ROM 112 and the like.
  • the computer 101 may include a network card (not shown) that provides connection to the LAN.
  • a program that causes the computer system 100 to execute the functions of the related term acquisition device and the technical direction information creation device according to the above-described embodiment is stored in the CD-ROM 121 or FD 122, and the CD-ROM drive 105, or It may be inserted into the FD drive 106 and transferred to the hard disk 114. Instead, the program may be transmitted to the computer 101 via a network (not shown) and stored in the hard disk 114. The program is loaded into RAMI 13 when executed. The program may be loaded directly from the CD-ROM 121, the FD 122, or the network.
  • the program does not necessarily include an operating system (OS) or a third-party program that causes the computer 101 to execute the functions of the related term acquisition device and the technical trend information creation device according to the above-described embodiment. May be. Program controlled May contain only the part of the instruction that calls the appropriate function (module) in the aspect and achieves the desired result. How the computer system 100 operates is well known and will not be described in detail.
  • OS operating system
  • module module
  • the related term information related to the accepted term information can be acquired, and the device etc. that acquires the related term related to a certain term Useful as.
  • FIG. 1 is a block diagram showing a configuration of a related term acquisition device according to Embodiment 1 of the present invention.
  • FIG. 2 is a block diagram showing a configuration of a related term information acquisition unit according to the embodiment
  • FIG. 3 is a flowchart showing the operation of the related term acquisition apparatus according to the embodiment.
  • FIG. 4 is a flowchart showing the operation of the related term acquisition apparatus according to the embodiment.
  • FIG. 5 is a flowchart showing the operation of the related term acquisition apparatus according to the embodiment.
  • FIG. 6 is a flowchart showing the operation of the related term acquisition apparatus according to the embodiment.
  • FIG. 7 is a diagram for explaining an experimental example in the embodiment.
  • FIG. 8 is a diagram for explaining an experimental example in the embodiment.
  • FIG. 9 is a block diagram showing a configuration of a related term acquisition apparatus according to Embodiment 2 of the present invention.
  • FIG. 10 is a block diagram showing the configuration of a related term information acquisition unit according to the embodiment
  • FIG. 11 is a block diagram showing a configuration of a related term information acquisition unit according to the embodiment
  • FIG. 12 is a flowchart showing the operation of the related term information acquisition unit according to the third embodiment of the present invention.
  • FIG. 13 is a diagram showing an example of a type C hand-warming phrase dictionary according to the embodiment.
  • FIG. 14 is a diagram showing an example of a type B hand-powered phrase dictionary according to the embodiment. [15] A diagram showing an example of bibliographic information of bibliographic information in the embodiment
  • ⁇ 16 Block diagram showing the configuration of the technology trend information creation device according to Embodiment 4 of the present invention.
  • ⁇ 17 Flow chart showing the operation of the technology trend information creation device according to the embodiment.
  • 18 Technology trend according to the embodiment.
  • Flowchart showing the operation of the information creation device ⁇ 19] Flowchart showing the operation of the technology trend information creation device according to the embodiment ⁇ 20] Flowchart showing the operation of the technology trend information creation device according to the embodiment ⁇ 21]
  • FIG. 22 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
  • FIG. 24 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
  • FIG. 25 is a flowchart showing the operation of the technical term extraction unit in the embodiment.
  • FIG. 26 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
  • FIG. 28 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
  • FIG. 29 is a flowchart showing the operation of the technical term extraction unit in the embodiment.
  • FIG. 30 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
  • FIG. 34 is a diagram showing an example of information stored in the document information storage unit in the embodiment.
  • FIG. 35 is a diagram showing an example of correspondence between type C clue words and positions of clue words in the embodiment.
  • FIG. 37 An example of a hand-powered phrase stored in the clue phrase storage unit in the embodiment. Illustration
  • FIG. 42 is a diagram showing an example of technical terms stored in the technical term storage means in the embodiment.
  • FIG. 43 Block diagram showing the configuration of the technology trend information creation device according to Embodiment 5 of the present invention ⁇ 44] Flow chart showing the operation of the technology trend information creation device according to the embodiment ⁇ 45] Technology trend according to the embodiment Flowchart showing the operation of the information creation device ⁇ 46] Block diagram showing the configuration of the technical trend information creation device according to Embodiment 6 of the present invention ⁇ 47] Flowchart showing the operation of the technical trend information creation device according to the embodiment ⁇ 48 ] A diagram showing an example of the technology trend information output in the embodiment
  • FIG. 55 is a schematic diagram showing an example of the external appearance of a computer system.
  • FIG. 56 is a diagram showing an example of the configuration of a computer system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

[PROBLEMS] A related term acquiring device for acquiring a related term by using document citation relation. [MEANS FOR SOLVING PROBLEMS] The related term acquiring device comprises a document information storage section (11) storing document information, a term information receiving section (12) for receiving term information, a document information acquiring section (13) for acquiring a part of document information at least including cited document identification information for identifying other documents cited in the document information including term information, a cited document information acquiring section (14) for acquiring a part of the cited document information identified by the acquired cited document identification information and corresponding to a predetermined item from the document information storage section (11), a related term information acquiring section (15) for acquiring related term information related to the term information from the part of the cited document information, and a related term information output section (16) for outputting the related term information.

Description

明 細 書  Specification
関連用語取得装置、関連用語取得方法、技術動向情報作成装置、技術 動向情報作成方法、及びプログラム  Related term acquisition device, related term acquisition method, technology trend information creation device, technology trend information creation method, and program
技術分野  Technical field
[0001] 本発明は、例えば、ある用語に関連する関連用語を取得する関連用語取得装置等 や、技術用語とその技術用語を含む技術文献に関する時期とを対応付ける情報であ る技術動向情報を作成する技術動向情報作成装置等に関する。  [0001] The present invention creates, for example, a related term acquisition device that acquires a related term related to a certain term, and technical trend information that is information that associates a technical term with a period related to a technical document including the technical term. The present invention relates to a technology trend information creation device and the like.
背景技術  Background art
[0002] 従来の情報処理装置は、 Web上の Webページを検索し、関連用語を自動収集す る (例えば、非特許文献 1、非特許文献 2、非特許文献 3、非特許文献 4参照)。一般 に、 Webからある専門用語 tに関連する用語を収集するには、まず、ある用語 tに関 する記述を収集し、そこから tと関連する用語を抽出するという手順が必要になる。こ こで、用語 tに関する適切な記述を収集する方法が、ポイントのひとつとなる。例えば 、非特許文献 1において、次に述べる方法で、用語 tに関する記述を収集している。 ある用語 tに対して、まず「tとは」、「tという」、「tは」、「t」の 4種類のクエリを検索ェン ジンに入力し、得られた URLのそれぞれ上位 100ページを入手する。次に、入手し たページを整形して文に分割し、用語 tを含む文のみを抽出し、そこから関連用語の 収集を行う。  [0002] A conventional information processing apparatus searches a Web page on the Web and automatically collects related terms (for example, see Non-Patent Document 1, Non-Patent Document 2, Non-Patent Document 3, and Non-Patent Document 4). . In general, in order to collect terms related to a technical term t from the Web, it is necessary to first collect descriptions related to the term t and then extract terms related to t from there. Here, one of the points is to collect appropriate descriptions about the term t. For example, in Non-Patent Document 1, descriptions relating to the term t are collected by the method described below. For a certain term t, first enter four types of queries, “What is t”, “T”, “t is”, and “t”, into the search engine, and the top 100 pages of the obtained URLs. Get Next, the obtained page is formatted and divided into sentences, and only sentences containing the term t are extracted, and related terms are collected from there.
[0003] なお、従来技術にお!、て、論文間の参照情報を考慮したサーベイ論文作成支援シ ステムが存在する(非特許文献 5)。  [0003] In addition, there is a survey paper creation support system that takes into account reference information between papers in the prior art (Non-Patent Document 5).
また、近年、ある用語に関連する用語をテキスト集合から自動的に収集する研究が 活発に行われている (非特許文献 1, 3, 4, 6参照)。  In recent years, research has been actively conducted to automatically collect terms related to a certain term from a text set (see Non-Patent Documents 1, 3, 4, and 6).
非特許文献 1 :佐藤理史、他 1名, 「ウェブを利用した関連用語の自動収集」、情報処 理学会研究報告 自然言語処理, NL- 153, p. 57-64, 2003年  Non-patent document 1: Satoshi Sato, 1 other, "Automatic collection of related terms using the web", Information Processing Society of Japan, Natural Language Processing, NL-153, p. 57-64, 2003
非特許文献 2 :佐々木靖弘、他 2名、「用語間の関連度を測る指標の提案」、言語処 理学会第 10回年次大会, p. 25— 28、 2004年  Non-Patent Document 2: Yasuhiro Sasaki, two others, “Proposal of an index for measuring the degree of association between terms”, 10th Annual Conference of the Association for Natural Language Processing, p. 25-28, 2004
非特許文献 3 :白井清昭、他 3名、「ポータルサイト自動作成の試み」、言語処理学会 第 10回年次大会、 p. 624— 627、 2004年 Non-Patent Document 3: Kiyoaki Shirai and 3 others, "Attempt to create portal site automatically", The Association for Language Processing 10th Annual Conference, p. 624—627, 2004
非特許文献 4:小原恭介、他 3名、「ウェブを利用した関連用語収集」、第 3回情報科 学技術フォーラム(FIT2004) , 2004年  Non-Patent Literature 4: Keisuke Ohara, 3 others, “Gathering Related Terms Using the Web”, 3rd Information Science Technology Forum (FIT2004), 2004
非特許文献 5 :難波英嗣、他 1名, 「論文間の参照情報を考慮したサーベイ論文作成 支援システムの開発」、自然言語処理, Vol. 6, No. 5, p. 43— 62、 1999年 非特許文献 6 :難波英嗣、「論文間の引用情報を利用した関連用語の自動収集」、言 語処理学会 第 11回年次大会、 2005年  Non-Patent Document 5: Eizo Namba and 1 other, “Development of survey paper creation support system considering reference information between papers”, Natural Language Processing, Vol. 6, No. 5, p. 43-62, 1999 Non-Patent Document 6: Ei Namba, “Automatic Collection of Related Terms Using Citation Information Between Papers”, The 11th Annual Conference of the Language Processing Society of Japan, 2005
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0004] し力しながら、従来の情報処理装置においては、文献の引用関係に着目しておら ず、用語ごとに、 Webサーチエンジンを使って関連ページを収集し、そこから関連用 語を抽出するため、関連用語の収集に時間がかかる、という問題があった。また、従 来の情報処理装置においては、文献の引用関係に着目していないために、関連用 語収集の精度が低い、という問題があった。  [0004] However, conventional information processing devices do not focus on the citation relationship of documents, but collect related pages for each term using a Web search engine and extract related terms from there. Therefore, there is a problem that it takes time to collect related terms. In addition, conventional information processing apparatuses have a problem that the accuracy of collecting related terms is low because they do not focus on the citation relationship of documents.
[0005] また、ある研究分野にお 、て、「どのような技術力^、つ頃から使われて!/、るの力」、と 言う情報を収集して整理することは、その分野の研究動向を概観するために必要不 可欠である。しかしながら、そのような情報を収集するには、その研究分野に精通し た人の多大な労力と時間を必要とする、と言う問題があった。  [0005] In addition, in a research field, collecting and organizing information such as “what kind of technical power ^ has been used since! It is indispensable to give an overview of research trends. However, collecting such information has been problematic in that it requires a great deal of labor and time from a person familiar with the research field.
[0006] また、上述の非特許文献等では、前述のように、ある用語に関連する用語を自動的 に収集することはできる。しかしながら、上述の非特許文献等では、技術用語がいつ 頃力も使われているのかを示す情報を作成することができな力つた。  [0006] Further, in the above non-patent documents, as described above, terms related to a certain term can be automatically collected. However, in the above-mentioned non-patent documents, etc., it was difficult to create information indicating when technical terms were used.
[0007] 本発明は、上記問題の少なくとも一つを解決するためになされたものであり、文献 の引用関係を用いることにより、適切な関連用語の取得を行うことができる関連用語 取得装置等を提供することを目的とする。  [0007] The present invention has been made to solve at least one of the above problems, and a related term acquisition device or the like capable of acquiring appropriate related terms by using a citation relationship between documents is provided. The purpose is to provide.
[0008] また、本発明は、上記問題点の少なくとも一つを解決するためになされたものであり 、技術用語とその技術用語を含む技術文献に関する時期とを対応付ける情報である 技術動向情報を作成することができる技術動向情報作成装置等を提供することを目 的とする。 課題を解決するための手段 [0008] The present invention has been made to solve at least one of the above-described problems, and creates technical trend information, which is information for associating a technical term with a period related to a technical document including the technical term. The purpose is to provide a technology trend information creation device, etc. that can do this. Means for solving the problem
[0009] 上記目的の少なくとも一つを達成するため、本発明による関連用語取得装置は、文 献を示す情報である文献情報が 2以上記憶される文献情報記憶部と、用語を示す情 報である用語情報を受け付ける用語情報受付部と、前記用語情報受付部が受け付 けた用語情報の示す用語を文献情報の全部または一部に有する文献情報を前記文 献情報記憶部で記憶されて!、る 2以上の文献情報から検索し、検索された文献情報 の全部、または当該検索された文献情報の一部であって、当該検索された文献情報 において引用して 、る他の文献を識別する情報である弓 I用文献識別情報を少なくと も有する部分を前記文献情報記憶部から取得する文献情報取得部と、前記文献情 報取得部が取得した文献情報の全部または一部に含まれる弓 I用文献識別情報で識 別される文献を示す情報である弓 I用文献情報を前記文献情報記憶部で記憶されて いる 2以上の文献情報力 検索し、検索された引用文献情報の一部であって、あらか じめ決められている項目に対応する部分を前記文献情報記憶部から取得する引用 文献情報取得部と、前記引用文献情報取得部が取得した引用文献情報の一部から 、前記用語情報受付部が受け付けた用語情報の示す用語に関連する関連用語を示 す情報である関連用語情報を取得する関連用語情報取得部と、前記関連用語情報 取得部が取得した関連用語情報を出力する関連用語情報出力部と、を備えたもので ある。  [0009] In order to achieve at least one of the above objects, a related term acquisition device according to the present invention includes a reference information storage unit that stores two or more reference information that is information indicating a reference, and information indicating a term. A term information receiving unit that accepts certain term information, and document information having the term indicated by the term information accepted by the term information receiving unit in all or part of the document information is stored in the document information storage unit !, Search from two or more document information, and identify all the document information searched or a part of the document information searched, and other documents cited in the document information searched. A document information acquisition unit that acquires from the document information storage unit a portion having at least the document identification information for bow I that is information, and a bow included in all or part of the document information acquired by the document information acquisition unit Document identification information for I The document information for bow I, which is information indicating the document identified by (2), is searched for two or more document information stored in the document information storage unit, and is a part of the retrieved cited document information. A citation information acquisition unit that acquires a part corresponding to a predetermined item from the document information storage unit, and a part of the citation information acquired by the citation information acquisition unit, the term information reception unit Related term information acquisition unit that acquires related term information that is information indicating related terms related to the term indicated by the term information received by the term information, and related term information that outputs the related term information acquired by the related term information acquisition unit And an output unit.
[0010] このような構成により、用語情報に関連する関連用語情報を取得することができる。  With such a configuration, related term information related to term information can be acquired.
特に、引用文献情報取得部が引用文献情報の一部を取得することによって、より適 切な関連用語情報の取得を行うことができうる。  In particular, when the cited document information acquisition unit acquires a part of the cited document information, more appropriate related term information can be acquired.
[0011] また、本発明による関連用語取得装置では、前記文献情報は、当該文献情報の構 造を記述する情報である構造記述情報を有し、前記引用文献情報取得部は、前記 文献情報取得部が取得した文献情報の全部または一部に含まれる弓 I用文献識別情 報で識別される引用文献を示す引用文献情報の一部であって、前記構造記述情報 によって、あら力じめ決められている項目に対応付けられている部分を取得してもよ い。  [0011] In the related term acquisition device according to the present invention, the document information includes structure description information that is information describing a structure of the document information, and the cited document information acquisition unit acquires the document information acquisition. Is a part of cited document information indicating a cited document identified by the document identification information for bow I included in all or a part of the document information acquired by the department, and is determined in advance by the structure description information. You may retrieve the part associated with the associated item.
[0012] このような構成により、文献情報は構造記述情報を有するため、引用文献情報取得 部は、あら力じめ決められている項目に対応する部分を容易に取得することができう る。 [0012] With such a configuration, since the document information includes the structure description information, the cited document information is acquired. The department can easily obtain the part corresponding to the item that has been determined in advance.
[0013] また、本発明による関連用語取得装置では、前記引用文献情報の 1以上の部分と 、当該部分の項目との対応を示す情報である引用文献構造情報を取得する引用文 献構造情報取得部をさらに備え、前記引用文献情報取得部は、前記引用文献構造 情報取得部が取得した引用文献構造情報を用いて、前記引用文献情報の一部であ つて、あら力じめ決められている項目に対応する部分を取得してもよい。  [0013] Further, in the related term acquisition apparatus according to the present invention, the cited document structure information acquisition is performed for acquiring the cited document structure information which is information indicating correspondence between one or more parts of the cited document information and items of the part. The cited document information acquisition unit is a part of the cited document information and is preliminarily determined using the cited document structure information acquired by the cited document structure information acquisition unit. You may acquire the part corresponding to an item.
[0014] このような構成により、引用文献情報取得部は、引用文献構造情報取得部が取得 した引用文献構造情報を用いて、あらかじめ決められている項目に対応する部分を 容易に取得することができうる。  With such a configuration, the cited document information acquisition unit can easily acquire a portion corresponding to a predetermined item using the cited document structure information acquired by the cited document structure information acquisition unit. It can be done.
[0015] また、本発明による関連用語取得装置では、前記文献情報取得部は、文献情報の 一部を取得する場合に、前記引用文献識別情報を少なくとも有する部分であって、 前記引用文献識別情報で識別される文献についての記述を示す情報である引用情 報を少なくとも有する部分を取得し、前記引用文献情報取得部は、前記文献情報取 得部が取得した文献情報の全部または一部に含まれる弓 I用文献識別情報で識別さ れる文献を示す情報である引用文献情報の一部であって、前記文献情報取得部が 取得した文献情報の全部または一部に含まれる弓 I用情報における引用形式に応じ てあらかじめ決められて!/、る項目に対応する部分を取得してもよ 、。  [0015] Further, in the related term acquisition device according to the present invention, when the document information acquisition unit acquires a part of the document information, the document information acquisition unit includes at least the cited document identification information, The cited reference information acquisition unit is included in all or part of the reference information acquired by the reference information acquisition unit. In the information for bow I included in all or part of the document information acquired by the document information acquisition unit, which is a part of the cited document information indicating the document identified by the document identification information for bow I You can get the part corresponding to the item that is predetermined according to the citation format!
[0016] このような構成により、引用形式に応じた項目に対応する部分を、引用文献情報取 得部が取得することができ、引用形式に応じて、どの項目に対応する引用文献情報 の一部を取得するのかを適切に設定することにより、より適切な関連用語情報の取得 が可能となりうる。  [0016] With such a configuration, the cited document information acquisition unit can acquire a portion corresponding to the item according to the citation format, and one of the cited document information corresponding to which item is determined according to the citation format. Appropriate setting of whether or not to acquire the part can enable more relevant terminology information to be acquired.
[0017] また、本発明による関連用語取得装置では、前記関連用語情報取得部は、前記引 用文献情報取得部が取得した引用文献情報の一部における関連用語の候補となる 情報である関連用語候補情報を取得する関連用語候補情報取得手段と、前記関連 用語候補情報取得手段が取得した関連用語候補情報の示す用語の関連度を前記 関連用語候補情報ごとに算出する関連度算出手段と、前記関連用語候補情報取得 手段が取得した関連用語候補情報のうち、対応する関連度が上位の関連用語候補 情報を関連用語情報に決定する関連用語情報決定手段と、を備えてもよい。 [0017] Further, in the related term acquisition device according to the present invention, the related term information acquisition unit is a related term that is information that is a candidate for a related term in a part of the cited document information acquired by the cited reference information acquisition unit. Related term candidate information acquiring means for acquiring candidate information, relevance level calculating means for calculating, for each related term candidate information, a relevance level of terms indicated by the related term candidate information acquired by the related term candidate information acquiring means, Related Term Candidate Information Acquisition Of related term candidate information acquired by the means, the related term candidate with the highest corresponding degree of relatedness Related term information determining means for determining information as related term information.
このような構成により、引用文献情報取得部によって取得された引用文献情報の一 部から、適切に関連用語情報を取得することができうる。  With such a configuration, related term information can be appropriately acquired from a part of the cited document information acquired by the cited document information acquiring unit.
[0018] また、本発明による関連用語取得装置では、前記文献情報記憶部では、第 1の種 類の文献情報と、第 2の種類の文献情報とが少なくとも記憶され、前記文献情報取得 部は、前記文献情報記憶部で記憶されている前記第 1の種類の文献情報から、第 1 の種類の文献情報の全部または一部を取得し、前記引用文献情報取得部は、前記 文献情報記憶部で記憶されている前記第 2の種類の文献情報から、引用文献情報 の一部を取得してもよい。  [0018] Further, in the related term acquisition device according to the present invention, the document information storage unit stores at least a first type of document information and a second type of document information, and the document information acquisition unit The document information storage unit acquires all or part of the first type of document information from the first type of document information stored in the document information storage unit, and the cited document information acquisition unit includes the document information storage unit. A part of the cited document information may be acquired from the second type of document information stored in the above.
[0019] このような構成により、ある用語に関連する関連用語であって、その用語と異なる分 野の関連用語を取得することができうる。例えば、学術文献で使用される用語に関連 する特許用語を取得することができうる。  [0019] With such a configuration, it is possible to obtain related terms related to a certain term but in a different field from the term. For example, patent terms related to terms used in academic literature may be obtained.
[0020] また、上記目的の少なくとも一つを達成するため、本発明による技術動向情報作成 装置は、技術文献の少なくとも一部を示す情報である複数の文献情報と、当該文献 情報の示す技術文献に関する時期を示す情報である時期情報とが少なくとも記憶さ れる文献情報記憶部と、前記文献情報記憶部で記憶されて!、る複数の文献情報の うち、少なくとも一部の文献情報の全部または一部を取得する文献情報取得部と、前 記文献情報取得部が取得した文献情報の全部または一部力 技術用語を抽出する 技術用語抽出部と、前記技術用語抽出部が抽出した技術用語が含まれる文献情報 に対応する時期情報を前記文献情報記憶部から取得する時期情報取得部と、前記 技術用語抽出部が抽出した技術用語と、前記時期情報取得部が取得した時期情報 とを用いて、技術用語と当該技術用語を含む技術文献に関する時期とを対応付ける 情報である技術動向情報を作成する技術動向情報作成部と、前記技術動向情報作 成部が作成した技術動向情報を出力する出力部と、を備えたものである。  [0020] In order to achieve at least one of the above objects, a technical trend information creation device according to the present invention includes a plurality of document information that is information indicating at least a part of a technical document, and a technical document indicated by the document information. A document information storage unit that stores at least timing information, which is information indicating a period of time, and is stored in the document information storage unit! All or one of at least some of the document information among the plurality of document information. A document information acquisition unit that acquires a part, a part or all of the document information acquired by the document information acquisition unit, a technical term extraction unit that extracts a technical term, and a technical term that the technical term extraction unit extracts The time information acquisition unit that acquires the time information corresponding to the document information stored from the document information storage unit, the technical terms extracted by the technical term extraction unit, and the time information acquired by the time information acquisition unit The technical trend information creation unit that creates technical trend information, which is information for associating technical terms with the timing of the technical literature including the technical terms, and the technical trend information created by the technical trend information creation unit. And an output unit for outputting.
[0021] このような構成により、技術用語とその技術用語を含む技術文献に関する時期とを 対応付ける情報である技術動向情報を作成することができる。その結果、その作成さ れた技術動向情報を用いることにより、ある研究分野にお 、て用いられて 、る技術の 変遷を容易に知ることができうる。また、その技術動向情報を作成するために、その 研究分野に精通した人の多大な労力と時間を費やす必要がないというメリットもある。 [0021] With such a configuration, it is possible to create technical trend information, which is information for associating a technical term with a period related to a technical document including the technical term. As a result, by using the created technology trend information, it is possible to easily know the transition of the technology used in a certain research field. In addition, to create the technology trend information, There is also an advantage that it is not necessary to spend a great deal of labor and time for those who are familiar with the research field.
[0022] また、本発明による技術動向情報作成装置では、検索用語を受け付ける検索用語 受付部と、前記検索用語受付部が受け付けた検索用語を用いて、前記文献情報記 憶部を検索する検索部と、をさらに備え、前記文献情報取得部は、検索された文献 情報の全部または一部を取得してもよ 、。  [0022] Further, in the technical trend information creation device according to the present invention, a search term receiving unit that receives a search term and a search unit that searches the document information storage unit using the search term received by the search term receiving unit. The document information acquisition unit may acquire all or part of the retrieved document information.
このような構成により、検索用語によって、技術動向情報を作成する研究分野を指 定することができうる。  With such a configuration, it is possible to specify a research field in which technical trend information is created by using a search term.
[0023] また、本発明による技術動向情報作成装置では、 2以上の検索用語が記憶される 検索用語記憶部と、前記検索用語記憶部で記憶されている 2以上の検索用語のうち 、いずれかの検索用語の選択を受け付ける検索用語選択受付部と、をさらに備え、 前記検索用語受付部は、前記検索用語選択受付部が受け付けた選択で示される検 索用語を受け付けてもよい。  [0023] Further, in the technical trend information creating device according to the present invention, any one of a search term storage unit storing two or more search terms and two or more search terms stored in the search term storage unit A search term selection accepting unit that accepts selection of the search term, and the search term accepting unit may accept the search term indicated by the selection accepted by the search term selection accepting unit.
[0024] このような構成により、検索用語として適切な用語 (例えば、研究分野を示す用語な ど)を検索用語記憶部において記憶しておくことにより、不適切な検索用語が入力さ れる事態を回避することができうる。その結果として、適切な検索を行うことができ、適 切な技術動向情報を作成することができるようになる。  [0024] With such a configuration, an appropriate search term (for example, a term indicating a research field) is stored in the search term storage unit so that an inappropriate search term is input. It can be avoided. As a result, an appropriate search can be performed and appropriate technical trend information can be created.
[0025] また、本発明による技術動向情報作成装置では、前記文献情報記憶部で記憶され ている文献情報力 専門用語を抽出する専門用語抽出部と、所定の接尾語が記憶 される接尾語記憶部と、前記専門用語抽出部が抽出した専門用語に前記接尾語記 憶部で記憶されて!、る接尾語を付けた用語が、前記文献情報記憶部で記憶されて いる文献情報に含まれる場合に、当該専門用語を検索用語として前記検索用語記 憶部に蓄積する検索用語蓄積部と、をさらに備えてもよい。  [0025] Further, in the technical trend information creation device according to the present invention, a technical term extraction unit that extracts technical information terminology stored in the literature information storage unit, and a suffix storage that stores a predetermined suffix And the technical term extracted by the technical term extraction unit is stored in the suffix storage unit, and the term with the suffix is included in the literature information stored in the literature information storage unit. In this case, a search term storage unit that stores the technical terms as search terms in the search term storage unit may be further included.
このような構成により、ユーザ等が検索用語を検索用語記憶部に蓄積する処理を 行わなくても、自動的に検索用語を検索用語記憶部に蓄積することができる。  With such a configuration, the search terms can be automatically stored in the search term storage unit without the user performing processing for storing the search terms in the search term storage unit.
[0026] また、本発明による技術動向情報作成装置では、前記検索部は、前記検索用語受 付部が受け付けた検索用語がメイントピックである技術文献に対応する文献情報を 検索してちょい。  [0026] In the technical trend information creation device according to the present invention, the search unit searches for document information corresponding to a technical document whose search term received by the search term receiving unit is a main topic.
このような構成により、検索用語をメイントピックとする分野の技術動向情報を作成 することができうる。 With this structure, technical trend information is created in the field whose main topic is search terms. Can be.
[0027] また、本発明による技術動向情報作成装置では、前記検索部によって検索された 文献情報と分野が同一の文献情報を特定する特定部をさらに備え、前記文献情報 取得部は、前記特定部が特定した文献情報の全部または一部も前記文献情報記憶 部から取得してもよい。  [0027] Further, in the technical trend information creation device according to the present invention, the technical trend information creation device further includes a specifying unit that specifies document information having the same field as the document information searched by the searching unit, and the document information acquiring unit includes the specifying unit All or a part of the document information specified by may be acquired from the document information storage unit.
[0028] このような構成により、検索用語を用いた検索によって取得することができない文献 情報であるが、その検索によって取得された文献情報と分野が同一の他の文献情報 を取得することができうる。その結果、より多くの文献情報を収集することができ、より 適切な技術動向情報を作成することができると考えられる。  [0028] With such a configuration, it is document information that cannot be acquired by a search using a search term, but other document information that has the same field as the document information acquired by the search can be acquired. sell. As a result, more literature information can be collected, and more appropriate technology trend information can be created.
[0029] また、本発明による技術動向情報作成装置では、前記特定部は、前記検索部によ つて検索された文献情報から、当該文献情報において引用している他の技術文献を 識別する情報である引用文献識別情報を取得し、当該引用文献識別情報を用いて 、前記検索部によって検索された文献情報と分野が同一の文献情報を特定してもよ い。  [0029] Further, in the technical trend information creation device according to the present invention, the specifying unit is information that identifies another technical document cited in the document information from the document information searched by the search unit. A certain cited document identification information may be acquired, and using the cited document identification information, the document information having the same field as the document information searched by the search unit may be specified.
このような構成により、引用関係を用いて文献情報の特定を行うことができうる。  With such a configuration, it is possible to specify document information using citation relationships.
[0030] また、本発明による技術動向情報作成装置では、前記特定部は、前記検索部によ つて検索された文献情報から、当該文献情報において引用している他の技術文献に ついての記述を示す情報である引用情報も取得し、当該引用情報の示す引用形式 力 引用して 、る技術文献の問題を指摘する形式である問題指摘型 I用形式である 場合に、当該引用情報で記述されている技術文献を識別する引用文献識別情報を 用いて、前記検索部によって検索された文献情報と分野が同一の文献情報を特定し てもよい。 [0030] Further, in the technical trend information creation device according to the present invention, the specifying unit describes a description of another technical document cited in the document information from the document information searched by the search unit. The citation information, which is the information to be shown, is also acquired, and the citation format indicated by the citation information is cited. Reference information having the same field as the document information searched by the search unit may be specified using cited document identification information for identifying the technical document being searched.
[0031] このような構成により、引用情報をも用いて、文献情報の特定を行うことができる。特 に、引用形式が、引用している技術文献の問題を指摘する形式である問題指摘型引 用形式である場合には、引用された文献情報と、引用した文献情報との分野が似て いることが発明者らの研究によってわ力つており、適切な文献情報の特定を行うこと ができうる。  [0031] With such a configuration, it is possible to specify document information using citation information as well. In particular, when the citation format is a problem-pointing reference format that points out the problem of the cited technical document, the fields of the cited document information and the cited document information are similar. Therefore, it is possible to identify appropriate literature information.
[0032] また、本発明による技術動向情報作成装置では、前記技術用語抽出部は、 1以上 の手がかり語句が記憶される手がかり語句記憶手段と、前記文献情報取得部が取得 した文献情報の全部または一部において、前記手が力り語句を検索する手力 Sかり語 句検索手段と、前記手が力り語句検索手段が検索した手が力り語句と所定の関係を 有する用語を、技術用語として抽出する技術用語抽出手段と、を備えてもよい。 このような構成により、手がかり語句を用いて、取得された文献情報の全部または一 部から、技術用語を抽出することができる。 [0032] Further, in the technical trend information creation device according to the present invention, the technical term extraction unit includes at least one A clue phrase storage means for storing a clue word phrase, a manual power S phrase search means for searching for a clue phrase in all or part of the document information acquired by the document information acquisition unit, Technical term extraction means may be provided that extracts, as technical terms, terms that have a predetermined relationship with the hand-strengthening phrase searched by the hand-strengthening phrase search means. With such a configuration, technical terms can be extracted from all or a part of the acquired document information using the clue words.
[0033] また、本発明による技術動向情報作成装置では、前記技術用語抽出部は、 1以上 の手がかり語句が記憶される手がかり語句記憶手段と、前記文献情報取得部が取得 した文献情報の全部または一部において、前記手が力り語句を検索する手力 Sかり語 句検索手段と、前記検索用語受付部が受け付けた検索用語と、前記手がかり語句検 索手段が検索した手が力り語句によって対応付けられている用語を技術用語として 抽出する技術用語抽出手段と、を備えてもよい。  [0033] In the technical trend information creation device according to the present invention, the technical term extraction unit includes clue word storage means for storing one or more clue words, and all or all of the document information acquired by the document information acquisition unit. In some cases, the hand S-phrase searching means for searching for a powerful phrase, the search term received by the search term receiving unit, and the hand searched by the clue phrase searching means by a powerful phrase Technical term extraction means for extracting the associated term as a technical term.
このような構成により、手力 Sかり語句と検索用語とを用いて、取得された文献情報の 全部または一部から、技術用語を抽出することができる。  With such a configuration, it is possible to extract technical terms from all or part of the acquired document information by using hand-power phrases and search terms.
[0034] また、本発明による技術動向情報作成装置では、前記技術用語抽出部は、 1以上 の手がかり語句が記憶される手がかり語句記憶手段と、前記文献情報取得部が取得 した文献情報の全部または一部において、前記手が力り語句を検索する手力 Sかり語 句検索手段と、 2以上の専門用語が記憶される専門用語記憶手段と、前記手がかり 語句検索手段が検索した手が力り語句と所定の関係を有する用語に、前記専門用 語記憶手段で記憶されている専門用語が含まれる場合に、当該専門用語を技術用 語として抽出する技術用語抽出手段と、を備えてもよい。  [0034] Further, in the technical trend information creation device according to the present invention, the technical term extraction unit includes clue word storage means for storing one or more clue words, and all or all of the document information acquired by the document information acquisition unit. In some cases, the hand searches for a powerful phrase, a power S phrase search means, a technical term storage means for storing two or more technical terms, and a hand searched by the clue phrase search means. Technical terms extraction means for extracting the technical terms as technical terms when the technical terms stored in the technical term storage means are included in the terms having a predetermined relationship with the phrase. .
このような構成により、技術用語抽出手段が一般名詞や非専門用語等を技術用語 として抽出することを回避することができうる。  With such a configuration, it is possible to avoid that the technical term extraction means extracts general nouns and non-technical terms as technical terms.
[0035] また、本発明による技術動向情報作成装置では、前記技術用語抽出部は、 1以上 の手がかり語句が記憶される手がかり語句記憶手段と、前記文献情報取得部が取得 した文献情報の全部または一部において、前記手が力り語句を検索する手力 Sかり語 句検索手段と、前記手が力り語句検索手段が検索した手が力り語句と所定の関係を 有する用語を、技術用語候補として抽出する技術用語候補抽出手段と、前記技術用 語候補抽出手段が抽出した技術用語候補の全部または一部を技術用語として選択 する技術用語選択手段と、を備えてもよい。 [0035] Also, in the technical trend information creation device according to the present invention, the technical term extraction unit includes clue word storage means for storing one or more clue words, and all or all of the document information acquired by the document information acquisition unit. In some cases, the technical term is a term that has a predetermined relationship with the hand-powered phrase search means for searching for a powerful phrase by the hand and the hand searched by the hand-held phrase searching means. Technical term candidate extraction means for extracting as a candidate, and for the technical Technical term selection means for selecting all or part of the technical term candidates extracted by the word candidate extraction means as technical terms.
このような構成により、まず手が力り語句を用いて技術用語候補を抽出し、その後に 、技術用語候補力も適切な技術用語を選択することができうる。  With such a configuration, first, a technical term candidate is extracted by using a hand-held phrase, and thereafter, an appropriate technical term can be selected for the technical term candidate power.
[0036] また、本発明による技術動向情報作成装置では、前記技術用語抽出部は、前記技 術用語が取り得る 1以上の接尾語が記憶される技術用語接尾語記憶手段をさらに備 え、前記技術用語選択手段は、前記技術用語接尾語記憶手段で記憶されている接 尾語を有する技術用語候補を技術用語として選択してもよい。  [0036] Further, in the technical trend information creating device according to the present invention, the technical term extraction unit further includes technical term suffix storage means for storing one or more suffixes that the technical term can take. The technical term selection means may select a technical term candidate having a suffix stored in the technical term suffix storage means as a technical term.
[0037] このような構成により、所定の接尾語を有する用語は、技術用語として適切であると 考えられるところ、その接尾語を用いた技術用語の選択を行うことによって、適切な技 術用語を選択することができうる。  [0037] With such a configuration, a term having a predetermined suffix is considered appropriate as a technical term. By selecting a technical term using the suffix, an appropriate technical term is selected. Can be selected.
[0038] また、本発明による技術動向情報作成装置では、前記技術用語選択手段は、前記 文献情報記憶部で記憶されて 、る 、ずれかの文献情報にぉ 、てメイントピックとして 用いられて 、る技術用語候補を技術用語として選択してもよ!、。  [0038] Further, in the technical trend information creation device according to the present invention, the technical term selection means is stored in the literature information storage unit, and is used as a main topic according to any literature information. You can select technical term candidates as technical terms!
[0039] このような構成により、いずれの文献情報においてもメイントピックとして用いられて いない用語は、技術用語として好適でないと考えられるところ、いずれかの文献情報 にお 、てメイントピックとして用いられて 、る技術用語候補を技術用語として選択する ことにより、適切な技術用語を選択することができうる。  [0039] With such a configuration, a term that is not used as a main topic in any document information is considered to be unsuitable as a technical term, but is used as a main topic in any document information. By selecting a technical term candidate as a technical term, an appropriate technical term can be selected.
[0040] また、本発明による技術動向情報作成装置では、前記技術用語抽出部は、前記技 術用語候補が抽出された文献情報と、分野が同一の文献情報を特定する文献情報 特定手段と、前記技術用語が取り得る 1以上の接尾語が記憶される技術用語接尾語 記憶手段と、をさらに備え、前記文献情報取得部は、前記文献情報特定手段が特定 した文献情報の全部または一部も取得し、前記技術用語選択手段は、前記文献情 報取得部が取得した、一の文献情報の全部または一部と、当該一の文献情報と分野 が同一であると前記文献情報特定手段によって特定された文献情報の全部または 一部とから抽出された技術用語候補の少なくともいずれかが、前記接尾語記憶手段 で記憶されて 、る接尾語を有する場合に、前記一の文献情報の全部または一部から 抽出された技術用語候補を技術用語として選択してもよ!/ヽ。 このような構成により、適切な技術用語の選択をすることができうる。 [0040] Further, in the technical trend information creation device according to the present invention, the technical term extraction unit includes literature information specifying means for specifying the literature information in which the technical field candidates are extracted, and literature information having the same field. A technical term suffix storage unit that stores one or more suffixes that can be taken by the technical term, and the literature information acquisition unit also includes all or part of the literature information identified by the literature information identification unit. The technical term selecting means identifies the whole or a part of the one piece of literature information obtained by the literature information obtaining unit and the field information and the field are the same as the one piece of literature information. When at least one of technical term candidates extracted from all or a part of the document information is stored in the suffix storage means and has a suffix, all or one of the one document information is stored. Part The extracted technical terms candidate be selected as the technical terms! / ヽ. With such a configuration, it is possible to select an appropriate technical term.
[0041] また、本発明による技術動向情報作成装置では、前記技術動向情報は、技術用語 ごとに、当該技術用語を含む文献情報に対応する時期情報の示す時期を示す画像 であり、前記出力部は、前記技術動向情報を表示出力してもよい。  [0041] Further, in the technical trend information creating device according to the present invention, the technical trend information is an image indicating a timing indicated by timing information corresponding to document information including the technical term for each technical term, and the output unit May display and output the technical trend information.
このような構成により、表示出力された技術動向情報の画像をユーザが見ることに より、技術の動向を容易にとらえることができうる。  With such a configuration, it is possible for the user to easily grasp the technology trend by viewing the image of the technology trend information displayed and output.
[0042] また、本発明による技術動向情報作成装置では、前記技術動向情報は、技術用語 ごとに、当該技術用語を含む文献情報に対応する図形である文献図形が、当該文 献情報に対応する時期情報の示す時期に応じて表示される画像であり、前記技術 動向情報における文献図形の選択を受け付ける図形選択受付部と、前記図形選択 受付部が選択を受け付けた文献図形に対応する文献情報に関する情報であり、前 記技術動向情報の画像に表示する情報である表示情報を前記文献情報記憶部から 取得する表示情報取得部と、をさらに備え、前記技術動向情報作成部は、前記表示 情報取得部が取得した表示情報を表示する技術動向情報を作成してもよい。  [0042] Further, in the technical trend information creation device according to the present invention, the technical trend information includes, for each technical term, a reference graphic corresponding to the reference information including the technical term corresponding to the reference information. It is an image that is displayed according to the time indicated by the time information, and relates to a document selection accepting unit that accepts selection of a document graphic in the technology trend information, and document information corresponding to the document graphic that the graphic selection accepting unit has accepted the selection. A display information acquisition unit that acquires from the document information storage unit display information that is information and information to be displayed on an image of the technical trend information. The technical trend information creation unit acquires the display information. Technical trend information for displaying the display information acquired by the department may be created.
[0043] このような構成により、例えば、技術動向情報の画像に表示されている技術用語の 含まれる文献情報についての所定の情報を、その技術動向情報の画像を見ている ユーザが容易に知ることができるようになり、ユーザの利便性が向上されて 、る。 発明の効果  [0043] With such a configuration, for example, the user who is viewing the technical trend information image can easily know the predetermined information about the document information including the technical term displayed in the technical trend information image. It will be possible to improve user convenience. The invention's effect
[0044] 本発明による関連用語取得装置等によれば、用語情報に関連する関連用語情報 を取得することができる。また、例えば、引用形式に応じた項目に対応する部分を取 得して関連用語情報を取得することによって、より適切な関連用語情報の取得が可 能となりうる。  [0044] According to the related term acquisition device and the like according to the present invention, related term information related to term information can be acquired. In addition, for example, it is possible to acquire more appropriate related term information by acquiring a portion corresponding to an item corresponding to the citation format and acquiring related term information.
[0045] 本発明による技術動向情報作成装置等によれば、技術用語とその技術用語を含 む技術文献に関する時期とを対応付ける情報である技術動向情報を作成することが できる。したがって、その技術動向情報を用いることにより、ある研究分野において用 V、られて 、る技術の変遷を容易に知ることができうる。  [0045] According to the technical trend information creating apparatus and the like according to the present invention, it is possible to create technical trend information that is information for associating technical terms with times related to technical documents including the technical terms. Therefore, by using the technology trend information, it is possible to easily know the changes in technology used in a certain research field.
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0046] 以下、本発明による関連用語取得装置、及び技術動向情報作成装置について、 実施の形態を用いて説明する。なお、以下の実施の形態において、同じ符号を付し た構成要素及びステップは同一または相当するものであり、再度の説明を省略するこ とがある。 [0046] Hereinafter, the related term acquisition device and the technical trend information creation device according to the present invention, This will be described using an embodiment. In the following embodiments, components and steps denoted by the same reference numerals are the same or equivalent, and the description thereof may be omitted.
[0047] (実施の形態 1)  [Embodiment 1]
本発明の実施の形態 1による関連用語取得装置について、図面を参照しながら説 明する。  A related term acquisition apparatus according to Embodiment 1 of the present invention will be described with reference to the drawings.
図 1は、本実施の形態による関連用語取得装置 1の構成を示すブロック図である。 図 1において、本実施の形態による関連用語取得装置 1は、文献情報記憶部 11と、 用語情報受付部 12と、文献情報取得部 13と、引用文献情報取得部 14と、関連用語 情報取得部 15と、関連用語情報出力部 16とを備える。  FIG. 1 is a block diagram showing a configuration of a related term acquisition device 1 according to this embodiment. In FIG. 1, a related term acquisition device 1 according to the present embodiment includes a document information storage unit 11, a term information reception unit 12, a document information acquisition unit 13, a cited document information acquisition unit 14, and a related term information acquisition unit. 15 and a related term information output unit 16.
[0048] 文献情報記憶部 11では、 2以上の文献情報が記憶される。ここで、文献情報とは、 文献を示す情報である。文献は、論文、特許書類 (例えば、出願公開公報、特許公 報等)、 Webページ、官報、書籍等の何らかのまとまりを有する文書であればどのよう なものでもよい。一の文献情報には、対応する一の文献の情報 (データ)が含まれる。 ただし、一の文献情報には、後の処理で必要となる範囲の情報が含まれているだけ であってもよい。例えば、学術論文の文献情報である場合に、その学術論文の参考 文献とタイトルのみを後の処理で用いる場合には、文献情報記憶部 11で記憶される 文献情報は、参考文献とタイトルのみを有する情報であってもよい。文献情報は、テ キストファイルであってもよぐあるいは、ワープロソフト等の独自形式のファイルであ つてもよい。文献情報がテキストファイルである場合に、そのテキストファイルでは、タ グを用いて文書を記述するマークアップ言語によって記述されて 、てもよく、ある ヽは 、対応する文献の内容が単にテキストデータによって示されていてもよい。文献情報 は、少なくともその一部に構造を有するものとする。ここで、その構造は、タグや見出 し等によって明示されていてもよぐあるいは、明示されていなくてもよい。後者の場 合には、例えば、文献情報で示される文献のあるパラグラフを読むことによって、そこ に要約が記載されていることがわかり、他のパラグラフを読むことによって、そこに目 的が記載されていることがわかる、というものでもよい。構造が明示されている場合に 、その構造は、文献情報の有する情報であり、文献情報の構造を記述する情報であ る構造記述情報によって示されてもよぐそうでなくてもよい。ここで、構造記述情報と は、例えば、日本の特許明細書で使用されている見出しのタグや、 HTMLファイル で使用されているタグ等である。構造記述情報以外によって文献情報の構造が記述 されている場合としては、例えば、通常の論文のように、「要約」や、「参考文献」等の 見出しのテキストデータが文献情報に含まれる場合がある。文献情報記憶部 11で記 憶される 2以上の文献情報には、少なくとも、他の文献を引用している文献を示す文 献情報が含まれるものとする。ある文献情報において引用されている他の文献のこと を引用文献と呼ぶことにする。他の文献を引用して 、る文献を示す文献情報には、 引用文献を識別する情報である引用文献識別情報が含まれる。ここで、引用文献識 別情報は、引用文献を識別することができる情報であれば、その内容を問わない。引 用文献識別情報は、例えば、特許番号や、特許出願の出願公開番号、 PMID (Pub Med Identifier)等の引用文献を識別する IDであってもよぐ著者名、タイトル、著 作年月日、媒体に関する情報 (雑誌名、出版社名、 URL等)等の引用文献を識別し うる文献の書誌情報であってもよい。文献情報記憶部 11は、所定の記録媒体 (例え ば、半導体メモリや磁気ディスク、光ディスクなど)によって実現されうる。文献情報記 憶部 11にユーザ属性対応情報が記憶される過程は問わない。例えば、記録媒体を 介して文献情報が文献情報記憶部 11で記憶されるようになってもよぐ通信回線等 を介して送信された文献情報が文献情報記憶部 11で記憶されるようになってもよぐ あるいは、入力デバイスを介して入力された文献情報が文献情報記憶部 11で記憶さ れるようになってもよい。 [0048] The document information storage unit 11 stores two or more document information. Here, the document information is information indicating a document. The document may be any document as long as it is a collection of articles, patent documents (for example, application publications, patent bulletins, etc.), Web pages, official gazettes, books, etc. One document information includes information (data) of the corresponding one document. However, one piece of document information may only include information in a range necessary for later processing. For example, in the case of document information of an academic paper, when only the reference document and title of the academic paper are used in the subsequent processing, the document information stored in the document information storage unit 11 includes only the reference document and title. It may be information. The bibliographic information may be a text file or a file of a unique format such as word processing software. If the document information is a text file, the text file may be described in a markup language that uses tags to describe the document. In some cases, the content of the corresponding document is simply text data. May be shown. Bibliographic information shall have a structure at least in part. Here, the structure may or may not be specified by a tag or a heading. In the latter case, for example, by reading a paragraph in the literature indicated by the literature information, you can see that the summary is written there, and by reading the other paragraph, the purpose is written there. It may be understood that it is. When the structure is specified, the structure is information that the document information has, and information that describes the structure of the document information. It may or may not be indicated by the structure description information. Here, the structure description information is, for example, a heading tag used in a Japanese patent specification or a tag used in an HTML file. As a case where the structure of the document information is described by other than the structure description information, for example, the text data of the heading such as “summary” or “reference document” may be included in the document information as in a normal paper. is there. It is assumed that the two or more document information stored in the document information storage unit 11 includes at least document information indicating a document that cites another document. Other documents cited in a certain document information are called cited documents. Cited document identification information, which is information for identifying a cited document, is included in the document information indicating a document cited by citing another document. Here, the content of the cited document identification information is not limited as long as it is information that can identify the cited document. Cited reference identification information may be, for example, an author's name, title, date of publication, which may be an ID for identifying a cited reference such as a patent number, an application publication number of a patent application, or a PMID (Pub Med Identifier). It may be bibliographic information of a document that can identify a cited document such as information on a medium (journal name, publisher name, URL, etc.). The document information storage unit 11 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.). The process in which the user attribute correspondence information is stored in the document information storage unit 11 does not matter. For example, document information transmitted via a communication line or the like may be stored in the document information storage unit 11 even if document information is stored in the document information storage unit 11 via a recording medium. Alternatively, the document information input via the input device may be stored in the document information storage unit 11.
用語情報受付部 12は、用語情報を受け付ける。ここで、用語情報とは、用語を示 す情報である。この用語情報は、一般に用語のテキストを示す情報であるが、用語を 示す情報であればそれ以外であってもよい。関連用語取得装置 1では、この用語情 報受付部 12で受け付けられた用語情報で示される用語に関連する用語である関連 用語が取得される。用語情報受付部 12は、例えば、入力デバイス (例えば、キーボ ードゃマウス、タツチパネルなど)カゝら入力された情報を受け付けてもよぐ有線もしく は無線の通信回線を介して送信された情報を受信してもよぐ所定の記録媒体 (例え ば、光ディスクや磁気ディスク、半導体メモリなど)カゝら読み出された情報を受け付け てもよい。なお、用語情報受付部 12は、受け付けを行うためのデバイス (例えば、モ デムやネットワークカードなど)を含んでもよぐあるいは含まなくてもよい。また、用語 情報受付部 12は、ハードウェアによって実現されてもよぐあるいは所定のデバイス を駆動するドライバ等のソフトウェアによって実現されてもよい。 The term information receiving unit 12 receives term information. Here, term information is information indicating a term. This term information is generally information indicating the text of the term, but may be other information as long as it indicates the term. In the related term acquisition device 1, a related term that is a term related to the term indicated by the term information received by the term information receiving unit 12 is acquired. For example, the term information receiving unit 12 may receive information input from an input device (for example, a keyboard, a mouse, a touch panel, etc.) or may be transmitted via a wired or wireless communication line. Accepts information read from a given recording medium (for example, optical disk, magnetic disk, semiconductor memory, etc.) that can receive information. May be. The term information receiving unit 12 may or may not include a device for receiving (for example, a modem or a network card). Further, the term information receiving unit 12 may be realized by hardware or may be realized by software such as a driver that drives a predetermined device.
[0050] なお、用語情報受付部 12は、用語情報以外の情報を受け付けてもよい。例えば、 関連用語取得装置 1によって取得された関連用語情報によって何らかの検索等を行 う場合に、その検索で用いる検索キー (例えば、検索で用いる国際特許分類等)を用 語情報と共に受け付けてもよい。そのような場合でも、以下の文献情報の取得等の処 理は、用語情報受付部 12の受け付けた用語情報を用いてなされるものとする。  Note that the term information receiving unit 12 may receive information other than the term information. For example, when a search is performed using related term information acquired by the related term acquisition device 1, a search key used in the search (for example, an international patent classification used in the search) may be received together with the term information. . Even in such a case, the following processing such as acquisition of document information is performed using the term information received by the term information receiving unit 12.
[0051] 文献情報取得部 13は、用語情報受付部 12が受け付けた用語情報の示す用語を 文献情報の全部または一部に有する文献情報を文献情報記憶部 11で記憶されて いる 2以上の文献情報から検索する。用語を文献情報の全部に有する文献情報とは 、その用語が文献情報のいずれかの箇所に含まれていることをいう。一方、用語を文 献情報の一部に有する文献情報とは、その用語が文献情報の一部の領域、例えば 、タイトルや、特許請求の範囲等のあら力じめ決められている領域に含まれることをい う。文献情報取得部 13が用語情報の示す用語を文献情報の一部に有する文献情 報を検索する場合には、例えば、その文献情報の一部に対応する項目を示す情報 が図示しない記録媒体において記憶されており、文献情報取得部 13は、その記録 媒体力も項目を示す情報を読み出して、上記の検索を行ってもよい。  [0051] The document information acquisition unit 13 has two or more documents stored in the document information storage unit 11 having document information having the terms indicated by the term information received by the term information receiving unit 12 in all or part of the document information. Search from information. The document information having the term in all the document information means that the term is included in any part of the document information. On the other hand, bibliographic information having a term as part of the document information includes the term in a partial area of the bibliographic information, for example, a predetermined area such as a title or a claim. Let's say. When the document information acquisition unit 13 searches for document information having the term indicated by the term information as part of the document information, for example, information indicating an item corresponding to a part of the document information is stored in a recording medium (not shown). The document information acquisition unit 13 that has been stored may read the information indicating the item of the recording medium power and perform the above search.
[0052] 文献情報取得部 13は、検索された文献情報の全部、またはその検索された文献 情報の一部を文献情報記憶部 11から取得する。文献情報取得部 13が、検索された 文献情報の一部を取得する場合には、その文献情報の一部は、検索された文献情 報において引用して 、る他の文献を識別する情報である弓 I用文献識別情報を少なく とも有する部分である。したがって、文献情報取得部 13によって、少なくとも引用文 献識別情報を含む文献情報の全部または一部が取得されることになる。文献情報取 得部 13が文献情報の一部を取得する場合には、例えば、文献情報取得部 13は、引 用文権識別情報を少なくとも含むと考えられる参考文献 (Reference)の欄や、特許 文献、非特許文献の欄を取得してもよぐあるいは、ノターンマッチング等の技術を 用いて、引用文献識別情報の存在する領域を特定し、その領域を含む文献情報の 一部を取得してもよい。前者の場合には、例えば、取得する部分に対応する項目を 示す情報が図示しな 、記録媒体にぉ 、て記憶されており、文献情報取得部 13は、 その記録媒体から項目を示す情報を読み出して、その項目に対応する文献情報の 一部を取得してもよい。また、後者の場合には、例えば、パターンマッチング等で用 いるパターン等を示す情報が図示しない記録媒体において記憶されており、文献情 報取得部 13は、その記録媒体力もパターン等を読み出してパターンマッチング等を 行うことによって文献情報の一部を取得してもよい。文献情報取得部 13が、用語情 報の示す用語を文献情報の一部に有する文献情報を検索し、また、文献情報取得 部 13が、検索された文献情報の一部を取得する場合に、その検索で用いる文献情 報の一部と、取得対象となる文献情報の一部とは、文献情報の同じ部分であってもよ ぐ文献情報の異なる部分であってもよい。 The document information acquisition unit 13 acquires all of the searched document information or a part of the searched document information from the document information storage unit 11. When the document information acquisition unit 13 acquires part of the retrieved document information, the part of the document information is information that identifies other documents cited in the retrieved document information. It is the part that has at least the document identification information for Bow I. Therefore, all or a part of the document information including at least the cited document identification information is acquired by the document information acquisition unit 13. When the document information acquisition unit 13 acquires a part of the document information, for example, the document information acquisition unit 13 may include a reference column or a patent that is considered to include at least reference text identification information. You can get the column of literature and non-patent literature, or use technology such as no-turn matching. It is also possible to identify a region where the cited document identification information exists and acquire a part of the document information including the region. In the former case, for example, information indicating an item corresponding to a part to be acquired is not illustrated, and is stored in a recording medium. The document information acquiring unit 13 receives information indicating an item from the recording medium. A part of the literature information corresponding to the item may be acquired by reading. In the latter case, for example, information indicating a pattern used for pattern matching or the like is stored in a recording medium (not shown), and the document information acquisition unit 13 reads the pattern and the like to read the pattern. A part of the document information may be acquired by performing matching or the like. When the document information acquisition unit 13 searches for document information having a term indicated by the term information as part of the document information, and the document information acquisition unit 13 acquires a part of the searched document information, The part of the document information used in the search and the part of the document information to be acquired may be the same part of the document information or different parts of the document information.
引用文献情報取得部 14は、引用文献情報を文献情報記憶部 11で記憶されている 2以上の文献情報から検索する。ここで、引用文献情報とは、文献情報取得部 13が 取得した文献情報の全部または一部に含まれる弓 I用文献識別情報で識別される文 献を示す文献情報である。そして、引用文献情報取得部 14は、その検索された引用 文献情報の一部を文献情報記憶部 11から取得する。引用文献情報取得部 14が取 得する引用文献情報の一部は、あら力じめ決められている項目に対応する引用文献 情報の部分である。あら力じめ決められている項目に対応する部分とは、その項目に ついて記載されている部分のことである。項目とは、文書において区分されうる領域 のことであり、例えば、タイトル、発明の名称、要約 (アブストラクト)、特許請求の範囲 、請求項、請求項の前提部 (プリアンブル)、 1番目に記載されている請求項 (以下、「 第 1請求項」とする)、第 1請求項に従属する請求項、発明の目的、発明の効果等で ある。あらかじめ決められている項目は、単一の項目であってもよぐ 2以上の項目で あってもよい。その項目は、引用文献情報において顕在化していてもよぐあるいは、 顕在化して 、なくてもょ 、。項目が顕在化して 、るとは、その項目に対応する部分が 引用文献情報において明示されていることである。例えば、項目名がテキストで表示 されることによって項目が顕在化していてもよぐあるいは、タグが用いられることによ り、項目が顕在化していてもよい。項目が顕在化していないとは、項目が顕在化して V、る場合のように項目に対応する部分が引用文献情報にぉ 、て明示されて!、な!/、こ とである。例えば、項目立てをしないで文書が記載されている場合などが該当する。 そのような場合であっても、引用文献情報取得部 14は、文書の所定の部分にどのよ うな項目の記載がなされて 、るのかを知ることができうる。 The cited document information acquisition unit 14 retrieves the cited document information from two or more document information stored in the document information storage unit 11. Here, the cited document information is document information indicating the document identified by the document identification information for bow I included in all or part of the document information acquired by the document information acquiring unit 13. Then, the cited document information acquisition unit 14 acquires a part of the retrieved cited document information from the document information storage unit 11. A part of the cited document information acquired by the cited document information acquiring unit 14 is a part of the cited document information corresponding to the item that has been determined in advance. The part that corresponds to the item that has been determined is the part that describes the item. An item is an area that can be classified in a document. For example, a title, a title of an invention, an abstract (abstract), a claim, a claim, a premise part (preamble) of a claim, which is described first. Claims (hereinafter referred to as “first claim”), claims subordinate to the first claim, the object of the invention, the effects of the invention, and the like. The predetermined items may be a single item or two or more items. The item may or may not be manifested in the cited reference information. The fact that an item is manifested means that the part corresponding to that item is clearly indicated in the cited document information. For example, the item name may be displayed as text and the item may be actualized, or the tag may be used. In other words, the item may be manifested. If the item is not actualized, the part corresponding to the item is clearly indicated in the cited document information, as in the case where the item is actualized, V, and so on. For example, a case where a document is described without setting items is applicable. Even in such a case, the cited document information acquisition unit 14 can know what items are described in a predetermined part of the document.
[0054] ここで、引用文献情報取得部 14が、あら力じめ決められている項目に対応する引 用文献情報の部分を取得する処理について説明する。あら力じめ決められている項 目を示す情報が、引用文献情報取得部 14がアクセス可能な図示しない記録媒体に おいて記憶されている場合に、引用文献情報取得部 14は、その記録媒体にアクセス して項目を示す情報を読み出す。そして、項目が引用文献情報において顕在化して いる場合には、引用文献情報取得部 14は、その読み出した項目に対応する引用文 献情報の部分を取得する。一方、項目が引用文献情報において顕在化していない 場合には、引用文献情報取得部 14は、その項目に対応する引用文献情報の部分を 特定し、その特定した引用文献情報の部分を取得する。次に、項目に対応する引用 文献情報の部分の特定について説明する。例えば、引用文献情報取得部 14は、特 定すべき各項目に対応する条件を示す条件情報を図示しな!、記録媒体にお!、て保 持している。そして、引用文献情報取得部 14は、ある項目に対応する引用文献情報 の部分を特定する場合には、その項目に対応する条件情報を取得し、その条件情報 に対応する弓 I用文献情報の部分を特定する。  Here, a process in which the cited document information obtaining unit 14 obtains a portion of the cited document information corresponding to a predetermined item will be described. In the case where information indicating items that have been determined in advance is stored in a recording medium (not shown) that can be accessed by the cited document information acquiring unit 14, the cited document information acquiring unit 14 Access to and read the information indicating the item. Then, when the item is actualized in the cited document information, the cited document information acquisition unit 14 acquires a portion of the cited document information corresponding to the read item. On the other hand, when the item is not manifested in the cited document information, the cited document information acquiring unit 14 specifies the cited document information part corresponding to the item, and acquires the specified cited document information part. Next, specification of the portion of the cited document information corresponding to the item will be described. For example, the cited document information acquisition unit 14 stores the condition information indicating the condition corresponding to each item to be specified! On the recording medium. Then, when specifying the cited document information part corresponding to a certain item, the cited document information acquiring unit 14 acquires the condition information corresponding to the item, and the reference information of the bow I document information corresponding to the condition information is acquired. Identify the part.
[0055] なお、文献情報に構造記述情報が含まれる場合には、引用文献情報取得部 14は 、文献情報取得部 13が取得した文献情報の全部または一部に含まれる弓 I用文献識 別情報で識別される引用文献を示す引用文献情報の一部であって、構造記述情報 によって、あら力じめ決められている項目に対応付けられている部分を取得してもよ い。  [0055] When the structure description information is included in the document information, the cited document information acquisition unit 14 identifies the document identification for the bow I included in all or part of the document information acquired by the document information acquisition unit 13. A part of cited document information indicating a cited document identified by information, which is associated with an item that has been determined in advance, may be acquired by structure description information.
[0056] また、引用文献情報取得部 14が、結果として上述のような引用文献情報の一部を 取得することができるのであれば、その過程は問わない。例えば、文献情報取得部 1 3が引用文献識別情報として引用文献のタイトルを含む情報を取得し、引用文献情 報取得部 14が項目「タイトル」に対応する引用文献情報の一部を取得する場合には 、引用文献情報取得部 14は、文献情報取得部 13が取得した引用文献識別情報か ら、引用文献情報のタイトルに対応する部分を取得することによって、引用文献情報 の一部を取得してもよい。 [0056] If the cited document information acquiring unit 14 can acquire a part of the cited document information as described above as a result, the process is not limited. For example, when the document information acquisition unit 13 acquires information including the title of the cited document as the cited document identification information, and the cited document information acquisition unit 14 acquires a part of the cited document information corresponding to the item “title” In The cited document information acquiring unit 14 may acquire a part of the cited document information by acquiring a part corresponding to the title of the cited document information from the cited document identification information acquired by the document information acquiring unit 13. Good.
[0057] 関連用語情報取得部 15は、引用文献情報取得部 14が取得した引用文献情報の 一部から、関連用語情報を取得する。ここで、関連用語情報とは、用語情報受付部 1 2が受け付けた用語情報の示す用語に関連する関連用語を示す情報である。関連 用語情報取得部 15の詳細な構成については、図 2を用いて後述する。  The related term information acquisition unit 15 acquires related term information from a part of the cited document information acquired by the cited document information acquisition unit 14. Here, the related term information is information indicating related terms related to the term indicated by the term information received by the term information receiving unit 12. The detailed configuration of the related term information acquisition unit 15 will be described later with reference to FIG.
[0058] 関連用語情報出力部 16は、関連用語情報取得部 15が取得した関連用語情報を 出力する。ここで、この出力は、例えば、表示デバイス (例えば、 CRTや液晶ディスプ レイなど)への表示でもよぐ所定の機器への通信回線を介した送信でもよぐプリン タによる印刷でもよぐスピーカによる音声出力でもよぐ記録媒体への蓄積でもよい The related term information output unit 16 outputs the related term information acquired by the related term information acquisition unit 15. Here, this output is output from a speaker that can be displayed on a display device (for example, a CRT or a liquid crystal display) or transmitted via a communication line to a predetermined device or printed by a printer. It may be stored in a recording medium or by audio output.
。なお、関連用語情報出力部 16は、出力を行うデバイス (例えば、表示デバイスゃプ リンタなど)を含んでもよぐあるいは含まなくてもよい。また、関連用語情報出力部 16 は、ハードウェアによって実現されてもよぐあるいは、それらのデバイスを駆動するド ライバ等のソフトウェアによって実現されてもょ 、。 . The related term information output unit 16 may or may not include an output device (for example, a display device or a printer). The related term information output unit 16 may be realized by hardware, or may be realized by software such as a driver for driving these devices.
[0059] 図 2は、関連用語情報取得部 15の構成を示すブロック図である。図 2において、関 連用語情報取得部 15は、関連用語候補情報取得手段 21と、関連度算出手段 22と 、関連用語情報決定手段 23とを備える。  FIG. 2 is a block diagram showing a configuration of the related term information acquisition unit 15. In FIG. 2, the related term information acquisition unit 15 includes a related term candidate information acquisition unit 21, a relevance degree calculation unit 22, and a related term information determination unit 23.
[0060] 関連用語候補情報取得手段 21は、引用文献情報取得部 14が取得した引用文献 情報の一部における関連用語の候補となる情報である関連用語候補情報を、引用 文献情報取得部 14が取得した引用文献情報の一部力も取得する。関連用語候補 情報取得手段 21は、取得した関連用語候補情報を図示しな!、記録媒体に蓄積して もよい。ここで、関連用語候補情報を取得する種々の方法についてここで説明する。  [0060] The related term candidate information acquisition means 21 is configured so that the cited reference information acquisition unit 14 obtains related term candidate information, which is information that is a candidate for a related term in a part of the cited reference information acquired by the cited reference information acquisition unit 14. A part of the acquired cited document information is also acquired. The related term candidate information acquisition means 21 may not store the acquired related term candidate information and may store it in a recording medium. Here, various methods for acquiring the related term candidate information will be described here.
[0061] [関連用語候補情報の取得方法 1] [0061] [Method for Acquiring Related Term Candidate Information 1]
関連用語候補情報取得手段 21は、引用文献情報取得部 14が取得した引用文献 情報の一部について形態素解析を行い、単名詞、及び Zまたは、複合名詞を示す 情報を関連用語候補情報として取得してもよい。関連用語候補情報取得手段 21は、 単名詞等を示す情報に加えて、未知語を示す情報も関連用語候補情報として取得 してもよい。ここで、形態素解析とは、自然言語の文書を形態素に分割し、その形態 素の品詞を見分ける処理である。形態素解析を行うソフトウェアとして、日本では奈良 先端科学技術大学院大学で開発された「ChaSen (茶筌)」(http : ZZchaSen. nai st. jp)等が知られている。茶筌を用いることにより、例えば、名詞類を示す情報を取 得することができる。ここで、名詞類とは、単名詞や未知語、またはこれらが連接した 複合名詞である。また、英単語に品詞を付与するソフトウェアとしては、「TnT」 (http : Z / www. coll. uni— saarland. deZ〜thorstenz tntZ)や「Brili TaggerJ ( http : //www. cs. jhu. eduZ〜brillZ)等が知られている。単名詞等を示す情 報を関連用語候補情報として取得する方法にっ 、ては、上記非特許文献 4等で開 示されており、従来力 知られているため、その詳細な説明を省略する。 The related term candidate information acquisition means 21 performs morphological analysis on a part of the cited reference information acquired by the cited reference information acquisition unit 14, and acquires information indicating a single noun and Z or compound noun as related term candidate information. May be. Related term candidate information acquisition means 21 acquires information indicating unknown words as related term candidate information in addition to information indicating single nouns, etc. May be. Here, the morpheme analysis is a process of dividing a natural language document into morphemes and recognizing the parts of speech of the morphemes. As a software for performing morphological analysis, “ChaSen” (http: ZZch aS en. Nai st. Jp) developed at the Nara Institute of Science and Technology is known in Japan. By using a tea bowl, for example, information indicating nouns can be obtained. Here, nouns are simple nouns, unknown words, or compound nouns that are connected together. In addition, software that gives parts of speech to English words includes “TnT” (http: Z / www. Coll. Uni—saarland. DeZ to thorstenz tntZ) and “Brili TaggerJ (http: // www. Cs. Jhu. ~ BrillZ) etc. The method of acquiring information indicating a single noun as related term candidate information has been disclosed in the above-mentioned Non-Patent Document 4 etc. Therefore, detailed description thereof is omitted.
[0062] [関連用語候補情報の取得方法 2]  [0062] [Related Term Candidate Information Acquisition Method 2]
関連用語候補情報取得手段 21は、引用文献情報取得部 14が取得した引用文献 情報の一部から、専門用語を示す情報を関連用語候補情報として取得してもよい。 ここで、専門用語を示す情報を取得する方法について説明する。まず、関連用語候 補情報取得手段 21は、上記の「関連用語候補情報の取得方法 1」と同様にして、引 用文献情報取得部 14が取得した引用文献情報の一部から、単名詞等を示す情報を 抽出する。そして、各単名詞等について専門用語である可能性を示すスコア付けを 行い、高いスコアを付与された単名詞等を示す情報を関連用語候補情報として取得 する。ここで、スコアを付与する方法して、複数の種類が知られている。  The related term candidate information acquisition means 21 may acquire information indicating technical terms as related term candidate information from a part of the cited reference information acquired by the cited reference information acquisition unit 14. Here, a method for acquiring information indicating technical terms will be described. First, the related term candidate information acquisition means 21 performs a simple noun, etc. from a part of the cited reference information acquired by the cited reference information acquisition unit 14 in the same manner as in the above-mentioned “Related Term Candidate Information Acquisition Method 1”. Extract information that indicates. Then, scoring indicating the possibility of being a technical term for each single noun is performed, and information indicating a single noun with a high score is acquired as related term candidate information. Here, a plurality of types are known as methods for assigning scores.
[0063] (造語能力に基づくスコア付け)  [0063] (Scoring based on coined word ability)
造語能力とは、単名詞の複合語の構成しやすさを示す尺度である。その造語能力 の高い名詞を含む単名詞等を示す情報には、高いスコアが付与される。ここで、造語 能力を示す指標として連接頻度を用いてもよい。連接頻度とは、ある語に連接する語 の異なり数の多さのことであり、ある語の直前または直後に現れる語の種類の多さの ことである。すなわち、連接頻度とは、その語がいかに多くの複合名詞に含まれやす いか、換言すれば、いかに複合名詞を構成しやすいかを示す尺度である。このように して、造語能力に基づいて各単名詞等にスコアを与えることができる。なお、複合名 詞においては、構成要素である単名詞のスコアの相乗平均を複合名詞のスコアとし て与えてもよい。 The word-making ability is a scale indicating the ease of composing a single noun compound word. A high score is given to information indicating single nouns including nouns with high word-making ability. Here, the connection frequency may be used as an index indicating the coined word ability. The concatenation frequency is the number of different words connected to a word, and the number of types of words that appear immediately before or after a word. In other words, articulation frequency is a measure of how many words are included in a compound noun, in other words, how easy it is to compose a compound noun. In this way, a score can be given to each single noun or the like based on the coined word ability. In compound nouns, the compound noun score is the geometric mean of the scores of the constituent single nouns. May be given.
[0064] (出現頻度に基づくスコア付け)  [0064] (Scoring based on appearance frequency)
注目している単名詞等を tとする。また、その tの構成語数を nとする。その場合に、 t に対応するスコアを次のように算出する。  Let t be the single noun of interest. The number of constituent words of t is n. In that case, the score corresponding to t is calculated as follows.
tのスコア = (tの出現頻度) Z (語数を nとする単名詞等の平均出現頻度) [0065] ここで、上記式中の「単名詞等」とは、関連用語候補情報取得手段 21が抽出した単 名詞等である。ここで、 tの出現頻度、及び語数を nとする単名詞等の平均出現頻度 は、引用文献情報取得部 14が取得した引用文献情報の一部において算出される。 なお、 tのスコアとして、単に「tの出現頻度」を用いてもよい。  score of t = (frequency of occurrence of t) Z (average frequency of occurrence of single nouns with n as the number of words) [0065] Here, "single noun etc." in the above formula means related term candidate information acquisition means 21 Are simple nouns extracted. Here, the appearance frequency of t and the average appearance frequency of single nouns and the like with n as the number of words are calculated in a part of the cited document information acquired by the cited document information acquisition unit 14. Note that “t occurrence frequency” may be simply used as the score of t.
[0066] また、上記説明以外の方法を用いて、スコア付けを行ってもよぐまた、上記説明以 外の方法を用いて、専門用語を示す情報を関連用語候補情報として取得してもよい[0066] In addition, scoring may be performed using a method other than the above description, and information indicating technical terms may be acquired as related term candidate information using a method other than the above description.
。専門用語を示す情報を関連用語候補情報として取得する方法については、上記 非特許文献 1、上記非特許文献 3、並びに下記の複数の文献において開示されてお り、従来力 知られているため、その詳細な説明を省略する。 . Methods for acquiring information indicating technical terms as related term candidate information are disclosed in Non-Patent Document 1, Non-Patent Document 3, and the following documents, and are well known in the past. Detailed description thereof is omitted.
[0067] 文献: Hiroshi NaKagawa^「Automatic Term Recognition based on S tatistics of Compound NounsJ、 Terminology、 Vol. 6、 No. 2、 p. 195— 2[0067] Literature: Hiroshi NaKagawa ^ “Automatic Term Recognition based on S tatistics of Compound NounsJ, Terminology, Vol. 6, No. 2, p. 195— 2
10、 2000 10, 2000
[0068] 文献:大畑博一、中川裕志、「連接異なり語数による専門用語抽出」、情報処理学 会研究報告、 2000-NL- 136, p. 119- 126  [0068] Literature: Hirokazu Ohata, Hiroshi Nakagawa, “Extraction of technical terms by different number of connected words”, Information Processing Society of Japan, 2000-NL-136, p. 119- 126
文献:中川裕志、森辰則、湯本紘彰、「出現頻度と連接頻度に基づく専門用語抽出 」、自然言語処理、 Vol. 10 No. l、p. 27— 45、 2003年 1月  Literature: Hiroshi Nakagawa, Yasunori Mori, Yasuaki Yumoto, “Extraction of technical terms based on appearance frequency and connection frequency”, Natural Language Processing, Vol. 10 No. l, p. 27-45, January 2003
[0069] [関連用語候補情報の取得方法 3]  [0069] [Method for Acquiring Related Term Candidate Information 3]
関連用語候補情報取得手段 21は、引用文献情報取得部 14が取得した引用文献 情報の一部の構造を解析して、名詞のまとまりを特定し、特定した名詞のまとまりに含 まれる単名詞等を示す情報を関連用語候補情報として取得してもよい。ここで、名詞 のまとまりとは、 1以上の単名詞を含むまとまりである。例えば、引用文献情報取得部 14が取得した引用文献情報の一部が、次に示されるような特許出願書類の請求項 であったとする。 [0070] 『操作手段によりァクチユエータを駆動して所望の作業を行う作業機において,前 記作業の作業機構に作成する負荷を検出する負荷検出手段と,この負荷検出手段 の検出値に応じた周波数の信号を出力する第 1の周波数変換器と,当該負荷検出 手段の検出値に応じた周波数のパルスを出力する第 2の周波数変換器と,前記第 1 の周波数変 ^ ^力 出力される信号を前記第 2の周波数変 力 のパルスの出 力期間だけ間欠的に出力する変調手段と,この変調手段の出力に応じて振動を発 生する振動発生手段とを設けたことを特徴とする作業機の操作用仮想振動生成装置 o J The related term candidate information acquisition means 21 analyzes the structure of a part of the cited reference information acquired by the cited reference information acquisition unit 14 to identify a group of nouns, and includes single nouns included in the specified noun group. May be acquired as related term candidate information. Here, a group of nouns is a group including one or more single nouns. For example, it is assumed that a part of the cited document information acquired by the cited document information acquiring unit 14 is a claim of a patent application document as shown below. [0070] In a working machine that performs a desired work by driving the actuator by the operation means, a load detection means for detecting a load created in the work mechanism of the work described above, and a frequency corresponding to the detection value of the load detection means A first frequency converter that outputs a first frequency signal, a second frequency converter that outputs a pulse of a frequency corresponding to the detection value of the load detecting means, and the first frequency modulation output signal. Is provided with a modulation means for intermittently outputting the first frequency change pulse during the output period and a vibration generation means for generating a vibration in accordance with the output of the modulation means. Virtual vibration generator for machine operation o J
[0071] 請求項では、一般に次のような構造的特徴が存在する。一つ目は、請求項の末尾 に「名詞のまとまり」が存在する、という特徴である。上記の例では、「作業機の操作用 仮想振動生成装置」がその名詞のまとまりに該当する。二つ目は、「において、」ゃ「 であって、」などの文字列によって請求項が前半部と後半部に分割される場合に、「 において、」や「であって、」の直前にも、末尾と同様の「名詞のまとまり」が存在する、 という特徴である。上記の例では、「作業機」がその名詞のまとまりに該当する。したが つて、関連用語候補情報取得手段 21は、請求項の末尾の名詞のまとまりと、「におい て、」等の直前の名詞のまとまりとを取得し、それらの名詞のまとまりに含まれる単名 詞等を示す情報を関連用語候補情報として取得してもよ 、。関連用語候補情報取得 手段 21は、「において、」等の直前の名詞のまとまりの場所を、「において、」等を手が 力り句として特定することができる。ここでは、特許出願書類の請求項について説明 したが、それ以外の構造が設定されている文書においても、同様にして名詞のまとま りを特定することができ、関連用語候補情報を取得することができる。  [0071] In the claims, there are generally the following structural features: The first feature is that there is a “group of nouns” at the end of the claim. In the above example, “virtual vibration generating device for operating a work machine” corresponds to the group of nouns. The second is that when the claim is divided into the first half and the second half by a character string such as “in”, “that is”, immediately before “in” and “in”. Is also characterized by the existence of a “group of nouns” similar to that at the end. In the above example, “work machine” corresponds to a group of the nouns. Therefore, the related term candidate information obtaining means 21 obtains a group of nouns at the end of the claim and a group of immediately preceding nouns such as `` Ode, '' and a single name included in the group of these nouns. Information indicating lyrics etc. may be acquired as related term candidate information. The related term candidate information acquisition means 21 can specify the location of a group of nouns immediately before “in” as a clue with “in”. Although the claims of patent application documents have been described here, a group of nouns can be specified in the same way in documents with other structures, and related term candidate information can be acquired. it can.
[0072] なお、文書の構造を解析して、名詞のまとまりを特定し、特定した名詞のまとまりに 含まれる単名詞等を示す情報を取得する方法については、下記の複数の文献にお いて開示されており、従来力も知られているため、その詳細な説明を省略する。  [0072] A method for analyzing a document structure to identify a group of nouns and obtaining information indicating single nouns included in the group of identified nouns is disclosed in the following documents. Since the conventional force is also known, detailed description thereof is omitted.
[0073] 文献:新森昭宏、奥村学、丸川雄三、岩山真、「手がかり句を用いた特許請求項の 構造解析」、情報処理学会論文誌、 Vol. 45、 No. 3、 p. 891— 905、 2004年 文献:特開 2004— 164054号公報  [0073] Literature: Akihiro Niimori, Manabu Okumura, Yuzo Marukawa, Makoto Iwayama, “Structural analysis of patent claims using clue phrases”, Transactions of Information Processing Society of Japan, Vol. 45, No. 3, p. 891— 905, 2004 Literature: JP 2004-164054
[0074] なお、関連用語候補情報取得手段 21は、頻出する一般的な名詞等を示す情報を 、関連用語候補情報力 除外してもよい。例えば、関連用語候補情報取得手段 21 は、除外すべき名詞等を示す情報を 1以上、図示しない記録媒体において保持して おき、関連用語候補情報として取得した情報に、除外すべき名詞等を示す情報のい ずれ力が含まれる場合には、その情報を関連用語候補情報力も除外してもよい。除 外すべき名詞等は、例えば、方法、記載、発行、文献等である。 It should be noted that the related term candidate information acquisition means 21 stores information indicating common nouns that frequently appear. The related term candidate information power may be excluded. For example, the related term candidate information acquisition means 21 holds one or more pieces of information indicating nouns to be excluded in a recording medium (not shown), and indicates the nouns to be excluded in the information acquired as the related term candidate information. If any information power is included, the related term candidate information power may be excluded from the information. Examples of nouns to be excluded include methods, descriptions, publications, and literature.
[0075] 関連度算出手段 22は、関連用語候補情報取得手段 21が取得した関連用語候補 情報の示す用語の関連度を、関連用語候補情報ごとに算出する。関連度算出手段 22は、その算出した関連度と、関連用語候補情報取得手段 21が取得した関連用語 候補情報とを対応付けて図示しな 、記録媒体に蓄積してもよ 、。  The degree-of-association calculating unit 22 calculates the degree of association of terms indicated by the related term candidate information acquired by the related term candidate information acquiring unit 21 for each piece of related term candidate information. The degree-of-association calculating unit 22 may store the calculated degree of association and the related term candidate information acquired by the related term candidate information acquiring unit 21 in association with each other, not shown in the drawing.
[0076] ここで、関連度の算出方法について説明する。関連度算出手段 22は、関連用語候 補情報の関連度を、その関連用語候補情報の示す用語が、引用文献情報取得部 1 4が取得した引用文献情報の一部 (なお、引用文献情報の一部が 1以上取得されて いる場合には、それらの集合)に出現する頻度としてもよい。また、関連度算出手段 2 2は、各関連用語候補情報について、「関連用語候補情報の取得方法 2」で説明した 、専門用語である可能性を示すスコア、例えば、造語能力に基づくスコア、出現頻度 に基づくスコア、あるいは連接頻度に基づくスコアを算出し、そのスコアを関連度とし てもよい。なお、この場合において、関連用語候補情報取得手段 21においてスコア がすでに算出されている場合には、関連度算出手段 22は、そのスコアを関連用語候 補情報取得手段 21から取得してもよぐあるいは、新たにスコアの算出を行ってもよ い。したがって、関連度算出手段 22による関連度の算出には、関連度の取得を含む ものとする。また、関連度算出手段 22は、関連用語候補情報の示す用語の出現頻 度と、上記のスコアとの両方を用いて関連度を算出してもよい。例えば、その両者を 加算して関連度を算出してもよぐその両者を乗算して関連度を算出してもよい。  Here, a method for calculating the degree of association will be described. The degree-of-relevance calculating means 22 indicates the degree of relevance of the related term candidate information, and the term indicated by the related term candidate information is a part of the cited reference information acquired by the cited reference information acquisition unit 14 ( If one or more of them are acquired, it may be the frequency of appearance in the set). Also, the degree-of-association calculation means 2 2 has, for each related term candidate information, described in “Related Term Candidate Information Acquisition Method 2”, a score indicating the possibility of being a technical term, for example, a score based on word-making ability A score based on the frequency or a score based on the connection frequency may be calculated, and the score may be used as the relevance level. In this case, if the score is already calculated by the related term candidate information acquisition unit 21, the relevance degree calculation unit 22 may acquire the score from the related term candidate information acquisition unit 21. Alternatively, a new score may be calculated. Accordingly, the calculation of the relevance by the relevance calculation means 22 includes acquisition of the relevance. Further, the degree-of-association calculating unit 22 may calculate the degree of association using both the appearance frequency of the term indicated by the related term candidate information and the above score. For example, the degree of association may be calculated by adding the two, or the degree of association may be calculated by multiplying the two.
[0077] 関連用語情報決定手段 23は、関連用語候補情報取得手段 21が取得した関連用 語候補情報のうち、対応する関連度が上位の関連用語候補情報を、関連用語情報 に決定する。関連用語情報決定手段 23は、例えば、関連用語候補情報取得手段 2 1が取得した関連用語候補情報のうち、対応する関連度が上位力も一定数量 (例え ば、 30個等)、あるいは一定割合 (例えば、 30%等)の関連用語候補情報を関連用 語情報に決定してもよぐ対応する関連度が所定のしきい値以上の関連用語候補情 報を関連用語情報に決定してもよぐあるいは、その他の方法により、対応する関連 度が上位の関連用語候補情報を関連用語情報に決定してもよい。より具体的には、 関連用語情報決定手段 23は、関連度算出手段 22が算出した関連度をソートし、そ のソートした関連度の上位に対応する関連用語候補情報を選択して、図示しない記 録媒体に関連用語情報として蓄積することによって、関連用語情報の決定を行う。こ の図示しない記録媒体に蓄積された関連用語情報が、関連用語情報出力部 16によ つて出力される関連用語情報である。関連用語情報決定手段 23は、関連用語情報 を図示しない記録媒体に蓄積してもよい。この場合には、その蓄積された関連用語 情報が関連用語情報出力部 16によって読み出され、出力されることになる。 [0077] The related term information determining unit 23 determines related term candidate information having a higher corresponding degree of related term candidate information from the related term candidate information acquired by the related term candidate information acquiring unit 21 as related term information. For example, the related term information determining means 23 may include, for example, a certain quantity of related terms candidate information acquired by the related term candidate information acquiring means 21 (for example, 30) or a certain percentage ( (For example, 30%) It is also possible to determine related term candidate information that has a corresponding relevance level equal to or higher than a predetermined threshold value as related term information, or to increase the corresponding relevance level by other methods. The related term candidate information may be determined as related term information. More specifically, the related term information determination unit 23 sorts the relevance levels calculated by the relevance level calculation unit 22, selects related term candidate information corresponding to the higher rank of the sorted relevance levels, and does not illustrate it. The related term information is determined by storing the related term information on the recording medium. The related term information stored in the recording medium (not shown) is the related term information output by the related term information output unit 16. The related term information determining means 23 may store the related term information in a recording medium (not shown). In this case, the stored related term information is read and output by the related term information output unit 16.
[0078] 次に、本実施の形態による関連用語取得装置 1の動作について、フローチャートを 用いて説明する。図 3は、関連用語取得装置 1の全体動作を示すフローチャートであ る。 Next, the operation of the related term acquisition device 1 according to the present embodiment will be described using a flowchart. FIG. 3 is a flowchart showing the overall operation of the related term acquisition device 1.
[0079] (ステップ S101)用語情報受付部 12は、用語情報を受け付けた力どうか判断する。  (Step S101) The term information receiving unit 12 determines whether or not the term information is received.
そして、用語情報を受け付けた場合には、ステップ S 102に進み、そうでない場合に は、用語情報を受け付けるまで、ステップ S101の処理を繰り返す。  If the term information is received, the process proceeds to step S102. If not, the process of step S101 is repeated until the term information is received.
[0080] (ステップ S102)文献情報取得部 13は、用語情報受付部 12が受け付けた用語情 報の示す用語を文献情報の全部または一部に有する文献情報を文献情報記憶部 1 1で記憶されている 2以上の文献情報力 検索し、検索された文献情報の全部、また はその検索された文献情報の一部であって、その検索された文献情報において引 用して 、る他の文献を識別する情報である弓 I用文献識別情報を少なくとも有する部 分を文献情報記憶部 11から取得する。この処理の詳細については、図 4のフローチ ヤートを用いて後述する。  (Step S102) The document information acquiring unit 13 stores the document information having the term indicated by the term information received by the term information receiving unit 12 in all or part of the document information in the document information storage unit 11. Two or more document information capabilities that have been searched and all the document information retrieved or a part of the retrieved document information, which is referred to in the retrieved document information. A portion having at least the document identification information for bow I that is information for identifying the document is acquired from the document information storage unit 11. Details of this process will be described later using the flowchart of FIG.
[0081] (ステップ S103)引用文献情報取得部 14は、文献情報取得部 13が取得した文献 情報の全部または一部に含まれる引用文献識別情報で識別される文献を示す情報 である引用文献情報を文献情報記憶部 11で記憶されている 2以上の文献情報から 検索し、検索された引用文献情報の一部であって、あら力じめ決められている項目に 対応する部分を文献情報記憶部 11から取得する。この処理の詳細については、図 5 のフローチャートを用いて後述する。 (Step S103) The cited document information acquisition unit 14 is cited document information that is information indicating a document identified by the cited document identification information included in all or part of the document information acquired by the document information acquisition unit 13. Is retrieved from two or more document information stored in the document information storage unit 11, and a part of the retrieved cited document information corresponding to a predetermined item is stored in the document information. Obtained from Part 11. See Figure 5 for details of this process. This will be described later with reference to the flowchart.
[0082] (ステップ S104)関連用語情報取得部 15は、引用文献情報取得部 14が取得した 引用文献情報の一部から、用語情報受付部 12が受け付けた用語情報の示す用語 に関連する関連用語を示す情報である関連用語情報を取得する。この処理の詳細 については、図 6のフローチャートを用いて後述する。  (Step S104) The related term information acquisition unit 15 uses related terms related to the term indicated by the term information received by the term information reception unit 12 from a part of the cited reference information acquired by the cited reference information acquisition unit 14. Related term information, which is information indicating Details of this processing will be described later using the flowchart of FIG.
[0083] (ステップ S105)関連用語情報出力部 16は、関連用語情報取得部 15が取得した 関連用語情報を出力する。そして、ステップ S101に戻る。  (Step S 105) The related term information output unit 16 outputs the related term information acquired by the related term information acquisition unit 15. Then, the process returns to step S101.
なお、図 3のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は 終了する。  In the flowchart of FIG. 3, the process ends when the power is turned off or the process is terminated.
[0084] 図 4は、図 2のフローチャートにおけるステップ S 102の文献情報の取得処理の詳細 につ 、て示すフローチャートである。  FIG. 4 is a flowchart showing details of the document information acquisition process in step S 102 in the flowchart of FIG.
(ステップ S201)文献情報取得部 13は、用語情報受付部 12で受け付けられた用 語情報を全部または一部に含む文献情報を文献情報記憶部 11で記憶されて ヽる 2 以上の文献情報から検索する。  (Step S201) The document information acquisition unit 13 includes document information including all or part of the term information received by the term information reception unit 12, and is stored from the two or more document information stored in the document information storage unit 11. Search for.
[0085] (ステップ S202)文献情報取得部 13は、カウンタ iを 1に設定する。 (Step S 202) The document information acquisition unit 13 sets a counter i to 1.
(ステップ S203)文献情報取得部 13は、ステップ S201で検索された i番目の文献 情報が存在するかどうか判断する。そして、存在する場合には、ステップ S204に進 み、存在しない場合には、一連の処理は終了となり、図 3のフローチャートのステップ (Step S203) The document information acquisition unit 13 determines whether or not the i-th document information searched in Step S201 exists. If it exists, the process proceeds to step S204. If it does not exist, the series of processing ends, and the steps in the flowchart of FIG.
S103に進む。 Proceed to S103.
[0086] (ステップ S204)文献情報取得部 13は、検索された i番目の文献情報の全部また は一部を取得する。検索された潘目の文献情報の一部を取得する場合には、前述 のように、引用文献識別情報を少なくとも含む部分を取得するものとする。なお、検索 された i番目の文献情報において、引用文献識別情報が全く含まれない場合には、 文献情報取得部 13は、何も取得しなくてもよ!、。  (Step S 204) The document information acquisition unit 13 acquires all or a part of the searched i-th document information. When acquiring a part of the retrieved document information of the cell, as described above, a part including at least the cited document identification information is acquired. When the retrieved i-th document information does not include any cited document identification information, the document information acquisition unit 13 does not have to acquire anything!
(ステップ S205)文献情報取得部 13は、カウンタ iを 1だけインクリメントして、ステツ プ S203に戻る。  (Step S205) The literature information acquisition unit 13 increments the counter i by 1 and returns to Step S203.
[0087] 図 5は、図 2のフローチャートにおけるステップ S103の引用文献情報の取得処理の 詳細について示すフローチャートである。 (ステップ S301)引用文献情報取得部 14は、カウンタ iを 1に設定する。 FIG. 5 is a flowchart showing details of the cited document information acquisition process of step S103 in the flowchart of FIG. (Step S301) The cited document information acquisition unit 14 sets the counter i to 1.
[0088] (ステップ S302)引用文献情報取得部 14は、文献情報取得部 13が取得した i番目 の文献情報が存在するかどうか判断する。そして、存在する場合には、ステップ S30 3に進み、存在しない場合には、一連の処理は終了となり、図 2のフローチャートのス テツプ S 104に進む。 (Step S302) The cited document information acquisition unit 14 determines whether or not the i-th document information acquired by the document information acquisition unit 13 exists. If it exists, the process proceeds to step S303. If it does not exist, the series of processing ends, and the process proceeds to step S104 in the flowchart of FIG.
[0089] (ステップ S303)引用文献情報取得部 14は、文献情報取得部 13が取得した i番目 の文献情報に含まれるすべての弓 I用文献識別情報を取得する。  (Step S303) The cited document information acquisition unit 14 acquires all the document identification information for bow I included in the i-th document information acquired by the document information acquisition unit 13.
(ステップ S304)引用文献情報取得部 14は、カウンタ jを 1に設定する。  (Step S304) The cited document information acquisition unit 14 sets the counter j to 1.
[0090] (ステップ S305)引用文献情報取得部 14は、ステップ S 303で取得した j番目の引 用文献識別情報が存在するかどうか判断する。そして、存在する場合には、ステップ(Step S 305) The cited document information acquisition unit 14 determines whether or not the j-th cited document identification information acquired in Step S 303 exists. And if present, step
S307に進み、存在しな ヽ場合に ίま、ステップ S306に進む。 Proceed to step S307, and if it does not exist, proceed to step S306.
(ステップ S306)引用文献情報取得部 14は、カウンタ iを 1だけインクリメントする。そ して、ステップ S302に戻る。  (Step S306) The cited document information acquisition unit 14 increments the counter i by one. Then, the process returns to step S302.
[0091] (ステップ S307)引用文献情報取得部 14は、ステップ S 303で取得した j番目の引 用文献識別情報で識別される弓 I用文献情報を文献情報記憶部 11から検索し、その 検索された引用文献情報のあらかじめ決められている部分を文献情報記憶部 11か ら取得する。 [0091] (Step S307) The cited document information acquisition unit 14 searches the document information storage unit 11 for the document information for the bow I identified by the jth reference document identification information acquired in step S303, and performs the search. A predetermined part of the cited reference information obtained is acquired from the reference information storage unit 11.
[0092] (ステップ S308)引用文献情報取得部 14は、ステップ S307で取得した引用文献 情報の一部がすでに取得されたものであるかどうか判断する。そして、すでに取得さ れたものである場合には、ステップ S310に進み、そうでない場合には、ステップ S30 9に進む。  (Step S308) The cited document information acquisition unit 14 determines whether a part of the cited document information acquired in Step S307 has already been acquired. If it has already been acquired, the process proceeds to step S310, and if not, the process proceeds to step S309.
[0093] (ステップ S309)引用文献情報取得部 14は、ステップ S307で取得した引用文献 情報の一部を一時的に記憶する。このステップ S309において追記されていく引用文 献情報の一部が、最終的に引用文献情報取得部 14によって取得された引用文献情 報の一部となる。  (Step S309) The cited document information acquisition unit 14 temporarily stores a part of the cited document information acquired in step S307. A part of the cited document information added in step S309 becomes a part of the cited document information finally acquired by the cited document information acquiring unit 14.
(ステップ S310)引用文献情報取得部 14は、カウンタ jを 1だけインクリメントする。そ して、ステップ S305に戻る。  (Step S310) The cited document information acquisition unit 14 increments the counter j by 1. Then, the process returns to step S305.
[0094] 図 6は、図 2のフローチャートにおけるステップ S 104の関連用語情報の取得処理の 詳細について示すフローチャートである。 FIG. 6 shows the related term information acquisition process of step S 104 in the flowchart of FIG. It is a flowchart shown about a detail.
(ステップ S401)関連用語候補情報取得手段 21は、引用文献情報取得部 14によ つて取得された引用文献情報の一部(引用文献情報の一部が 2以上取得されている 場合には、その引用文献情報の一部の 2以上の集合)、すなわち、ステップ S309に おいて一時的に記憶された引用文献情報の一部力 関連用語候補情報を取得する  (Step S401) The related term candidate information acquisition means 21 uses a part of the cited document information acquired by the cited document information acquisition unit 14 (if two or more of the cited document information are acquired, Two or more sets of part of cited reference information), that is, partial power of cited reference information temporarily stored in step S309.
[0095] (ステップ S402)関連度算出手段 22は、カウンタ iを 1に設定する。 (Step S 402) Relevance calculation means 22 sets counter i to 1.
(ステップ S403)関連度算出手段 22は、関連用語候補情報取得手段 21によって 引用文献情報の一部力も取得された潘目の関連用語候補情報が存在するかどうか 判断する。そして、存在する場合には、ステップ S404に進み、存在しない場合には、 ステップ S407に進む。  (Step S403) The degree-of-association calculating unit 22 determines whether there is related term candidate information of the cells for which the related term candidate information acquiring unit 21 has also acquired the partial power of the cited document information. If it exists, the process proceeds to step S404. If it does not exist, the process proceeds to step S407.
[0096] (ステップ S404)関連度算出手段 22は、関連用語候補情報取得手段 21によって 引用文献情報の一部力も取得された潘目の関連用語候補情報の関連度を算出す る。  [0096] (Step S404) The degree-of-association calculating unit 22 calculates the degree of association of the related term candidate information of the cells for which the related term candidate information acquiring unit 21 has also acquired partial power of the cited document information.
[0097] (ステップ S405)関連用語情報決定手段 23は、ステップ S404で算出された関連 度を、対応する関連用語候補情報に対応付けて一時記憶する。ここで、ステップ S4 04で算出された関連度に対応する関連用語候補情報とは、関連用語候補情報取得 手段 21によって引用文献情報の一部力 取得された i番目の関連用語候補情報の ことである。  (Step S405) The related term information determining means 23 temporarily stores the degree of association calculated in step S404 in association with the corresponding related term candidate information. Here, the related term candidate information corresponding to the degree of relevance calculated in step S404 is the i-th related term candidate information acquired by the related term candidate information acquisition means 21 as part of the cited reference information. is there.
[0098] (ステップ S406)関連度算出手段 22は、カウンタ iを 1だけインクリメントする。そして 、ステップ S403に戻る。  (Step S406) The degree-of-association calculating means 22 increments the counter i by 1. Then, the process returns to step S403.
(ステップ S407)関連用語情報決定手段 23は、ステップ S405で一時記憶した関 連度をソートする。  (Step S407) The related term information determining means 23 sorts the relevance temporarily stored in Step S405.
[0099] (ステップ S408)関連用語情報決定手段 23は、ステップ S407でソートした上位の 関連度に対応する関連用語候補情報を、関連用語情報に決定する。そして、一連の 処理は終了となり、図 2のフローチャートのステップ S105に進む。  [0099] (Step S408) The related term information determining means 23 determines related term candidate information corresponding to the higher relevance degree sorted in step S407 as related term information. Then, the series of processing ends, and the process proceeds to step S105 in the flowchart of FIG.
[0100] 次に、本実施の形態による関連用語取得装置 1によって得られる具体的な効果を 示す実験例について説明する。この実験例では、文献情報取得部 13が取得する文 献情報は学術論文であるとする。一方、引用文献情報取得部 14が取得する引用文 献情報は特許に関する文献情報であるとする。したがって、関連用語取得装置 1に 学術論文用語を示す用語情報を入力することにより、その学術論文用語に関連する 特許用語を示す関連用語情報が出力されることになる。ここで、学術論文用語とは、 学術論文において比較的多く用いられる用語 (例えば、 DRAMやワードプロセッサ などの具体的な下位の用語)であり、特許用語とは、特許書類において比較的多く用 いられる用語 (例えば、半導体記憶装置や文書編集装置などの抽象的な上位の用 語)である。 [0100] Next, experimental examples showing specific effects obtained by the related term acquisition device 1 according to the present embodiment will be described. In this experimental example, the sentence acquired by the document information acquisition unit 13 The donation information is an academic paper. On the other hand, it is assumed that the cited document information acquired by the cited document information acquiring unit 14 is document information related to a patent. Therefore, by inputting the term information indicating the academic paper term into the related term acquisition device 1, the related term information indicating the patent term related to the academic paper term is output. Here, academic paper terms are terms that are used relatively frequently in academic papers (for example, specific subordinate terms such as DRAM and word processor), and patent terms are used relatively frequently in patent documents. It is a term (for example, an abstract high-level term such as a semiconductor storage device or a document editing device).
[0101] この実験例では、特許公開公報(1993年〜 2002年)の情報が文献情報記憶部 1 1で記憶される文献情報に含まれる。また、その特許公開公報中の引用論文の書誌 情報約 85000件も文献情報記憶部 11で記憶される文献情報に含まれる。なお、特 許公開公報から、その特許公開公報で引用されている論文の書誌情報を抽出する 方法としては、次の文献で開示されている方法を用いた。  [0101] In this experimental example, the information of the patent publication (1993 to 2002) is included in the document information stored in the document information storage unit 11. Also, the bibliographic information of about 85,000 bibliographic information of cited papers in the patent publication is included in the bibliographic information stored in the bibliographic information storage unit 11. The method disclosed in the following document was used as a method for extracting the bibliographic information of the paper cited in the patent publication from the patent publication.
[0102] 文献:安善奈津美、難波英嗣、相沢輝昭、奥村学、「特許、論文データベースを統 合した検索環境の構築」、情報処理学会研究報告、 NL— 168、 p. 21— 26、 2005 年  [0102] Literature: Natsumi Anzen, Hideaki Namba, Teruaki Aizawa, Manabu Okumura, “Construction of a Search Environment that Integrates Patents and Article Databases”, Information Processing Society of Japan Research Report, NL—168, p. 21—26, 2005
[0103] まず、用語情報受付部 12が受け付ける用語情報について説明する。文献情報記 憶部 11で記憶されている特許公開公報の情報から名詞を抽出する。そして、その抽 出した名詞を頻度順に並べ、その頻度の高いもの力 論文用語 25語を人手で選択 する。具体的には、「CPU」、「半導体レーザ」、「DRAM」、「メモリセル」、「ワードプ ロセッサ」、「ノボラック樹脂」、「CD」等を選択した。  First, term information received by the term information receiving unit 12 will be described. The nouns are extracted from the information of the patent publications stored in the document information storage unit 11. Then, the extracted nouns are arranged in order of frequency, and the 25 most frequently used articles are selected manually. Specifically, “CPU”, “semiconductor laser”, “DRAM”, “memory cell”, “word processor”, “novolak resin”, “CD”, etc. were selected.
[0104] この実験例において、引用文献情報取得部 14が取得する引用文献情報の箇所の パターンは以下の通りである。  In this experimental example, the pattern of the portion of the cited document information acquired by the cited document information acquiring unit 14 is as follows.
[0105] (パターン 1)引用文献の第 1請求項、すなわち、特許請求の範囲の 1番目に記載さ れている請求項を示す情報を引用文献情報の一部として取得する。  (Pattern 1) Information indicating the first claim of the cited document, that is, the first described claim is acquired as a part of the cited document information.
[0106] (パターン 2)引用文献の全請求項、すなわち、特許請求の範囲に記載されている すべての請求項を示す情報を引用文献情報の一部として取得する。  (Pattern 2) All the claims of the cited document, that is, information indicating all claims described in the claims is acquired as a part of the cited document information.
[0107] (パターン 3)引用文献の第 1請求項を示す情報と、その第 1請求項に従属する請求 項である従属請求項を示す情報とを引用文献情報の一部として取得する。ここで、第[Pattern 3] Information indicating the first claim of the cited document, and claims dependent on the first claim Information indicating a dependent claim that is a term is acquired as part of the cited document information. Where
1請求項に従属する請求項には、第 1請求項に直接従属する請求項だけでなぐ第 1 請求項に他の請求項を介して従属している請求項も含まれる。 Claims dependent on one claim include claims dependent on the first claim through other claims, not just on the claims directly dependent on the first claim.
[0108] また、この実験例において、関連用語候補情報取得手段 21が関連用語候補情報 を取得するパターンは以下の通りである。  In this experimental example, the pattern in which the related term candidate information acquisition unit 21 acquires the related term candidate information is as follows.
[0109] (パターン 関連用語候補情報の取得方法 1で説明したように、引用文献情報の 一部について形態素解析を行い、名詞を関連用語候補として取得する。 (Pattern Related Term Candidate Information Acquisition Method 1 As described in Method 1, a morphological analysis is performed on a part of cited document information, and a noun is acquired as a related term candidate.
[0110] (パターン B)関連用語候補情報の取得方法 3で説明したように、引用文献情報の 一部から名詞のまとまりを特定し、その名詞のまとまりに含まれる名詞を関連用語候 補として取得する。 [0110] (Pattern B) As described in Method 3 for acquiring related term candidate information, identify a group of nouns from a part of cited reference information, and acquire nouns included in the group of nouns as related term candidates. To do.
[0111] なお、関連用語候補情報取得手段 21は、所定の不要語句リストを用いて、その不 要語句リストに含まれる語句と一致する関連用語候補を取得しないものとする。「方法 」や「記載」等の用語は、分野を問わず多くの特許請求項に出現するため、そのような 用語を関連用語候補から除外するためである。不要語句リストは、 10年分の特許公 開公報に含まれる名詞を出現頻度順に並べ、頻度の高いものの中から人手で不要 語句リストに含める語句を選択した。例えば、方法、記載、発行、文献、使用、利用、 詳細、製造、提案、製造方法、データ等の合計 350個の不要語句が不要語句リスト に含まれる。  [0111] It should be noted that the related term candidate information acquisition unit 21 does not acquire a related term candidate that matches a phrase included in the unnecessary phrase list by using a predetermined unnecessary phrase list. Terms such as “method” and “description” appear in many claims regardless of the field, so that such terms are excluded from the related term candidates. In the unnecessary word list, the nouns included in the patent publications for 10 years are arranged in order of appearance frequency, and the words to be included in the unnecessary word list are selected manually from the most frequently used ones. For example, a total of 350 unnecessary words such as method, description, publication, literature, use, use, details, manufacturing, proposal, manufacturing method, and data are included in the unnecessary word list.
[0112] また、この実験例においては、引用文献情報の一部を取得することについての効 果を検証するため、関連度算出手段 22における関連度の算出は行わな!/、ものとす る。また、関連用語情報決定手段 23において、関連用語候補情報取得手段 21が取 得した関連用語候補情報をそのまま関連用語情報に決定するものとする。  [0112] Also, in this experimental example, in order to verify the effect of acquiring a part of cited document information, the relevance calculation means 22 does not calculate the relevance! . Further, the related term information determining unit 23 determines the related term candidate information acquired by the related term candidate information acquiring unit 21 as the related term information as it is.
[0113] このような状況の下、上述の 25個の各用語情報を用語情報受付部 12が受け付け 、その用語情報に関連する関連用語情報を関連用語情報出力部 16が出力するまで の処理を実行した。関連用語情報出力部 16は、関連用語情報を所定の記録媒体に 蓄積するものとする。その後、その関連用語情報を用いて、検証を行った。  [0113] Under such circumstances, the term information receiving unit 12 receives each of the 25 term information items described above, and the related term information output unit 16 outputs related term information related to the term information. Executed. The related term information output unit 16 stores the related term information in a predetermined recording medium. After that, the related term information was used for verification.
[0114] この検証では、以下の基準により、関連用語情報の正否の判定を行った。  [0114] In this verification, whether the related term information is correct or not was determined according to the following criteria.
(基準 1)概念的に最も近い用語のみ正解 例えば、用語情報「ワードプロセッサ」に対して、関連用語情報「文書編集装置」を 正解とし、ワードプロセッサの構成要素である関連用語情報「表示装置」は不正解と した。 (Standard 1) Only the term that is conceptually closest is correct For example, for the term information “word processor”, the related term information “document editing device” is the correct answer, and the related term information “display device” that is a component of the word processor is incorrect.
[0115] (基準 2)特許データベース中の出現頻度  [0115] (Standard 2) Frequency of appearance in patent database
ある用語の出現頻度が特許データベース中で極端に低い場合は、その用語は特 許の関連用語の取得を行う上で有用でないと考え、不正解とした。  If the frequency of occurrence of a term was extremely low in the patent database, it was considered incorrect because the term was not useful for obtaining patent related terms.
[0116] (基準 3)基準 1で選択されたものとの比較 [0116] (Criteria 3) Comparison with the one selected in Criteria 1
ある用語が上記 (基準 2)を満たさな!/ヽ場合でも、その用語が上記 (基準 1)で選択さ れたものと概念的にほぼ等しいと判断される場合には、低頻度でも正解とした。例え ば、用語情報「ワードプロセッサ」に対して、関連用語情報「文書編集装置」と概念的 にほぼ等しい関連用語情報「文書作成装置」も正解であるとした。「レーザ」と「レーザ 一」のような表記のゆれについても、一方の関連用語情報が正解と判定されていれ ば、もう一方も正解とした。  Even if a term does not meet the above (Criteria 2)! / ヽ, if it is judged that the term is conceptually almost equal to the one selected above (Criteria 1), the correct answer will be given even if it is infrequent. did. For example, for the term information “word processor”, it is assumed that the related term information “document creation device” that is conceptually almost equivalent to the related term information “document editing device” is also correct. For fluctuations such as “Laser” and “Laser I”, if the related term information is determined to be correct, the other is also correct.
[0117] また、この検証では、次式の評価尺度 εを用いた。これは、質問応答システムの評 価において一般的に用いられる MRR (mean reciprocal rank)を拡張したもので あり、次の文献で開示されている。 [0117] In this verification, the following evaluation scale ε was used. This is an extension of the MRR (mean reciprocal rank) generally used in the evaluation of question answering systems, and is disclosed in the following document.
[0118] 文献:清田陽司、黒橋禎夫、木戸冬子、「自動抽出した換喩表現を用いた係り受け 関係のずれの解消」、自然言語処理、 Vol. 11、 No. 4、 p. 127—145、 2004年[0118] Literature: Yoji Kiyota, Ikuo Kurohashi, Toko Kido, “Resolving Dependency Displacement Using Automatically Extracted Metaphoric Expressions”, Natural Language Processing, Vol. 11, No. 4, p. 127—145 , 2004
[数 1] [Number 1]
∑,… ,"} )ゾ ∑,…, "})
[0119] ここで、 nは入力に対する正解の数である。 Rは出力されたリスト中の正解順位番号 の集合である。 εは正解がすべて最上位に順位付けされたときに、最大値 1をとる。  [0119] where n is the number of correct answers for the input. R is the set of correct answer rank numbers in the output list. ε takes a maximum value of 1 when all correct answers are ranked at the top.
[0120] 図 7は、この実験例の結果を示す表である。図 7において、 IDは、各実験例を識別 するために設定されている。 ID6の実験例は、他の実験例でのベースラインとなる実 験例であり、用語情報と高頻度で共起する名詞を抽出し、その名詞を関連用語とし たものである。このベースラインとなる ID6の実験例力 従来からの関連用語の取得 において用いられている手法である。図 7の表からわ力るように、 ID1の実験例と ID2 の実験例とを比較すると、単に名詞を取得するのみでなぐ請求項における名詞のま とまりから名詞を取得した方が優れていることがわかる。また、 ID3の実験例と、 ID4 の実験例とを比較すると、やはり単に名詞を取得するのみでなぐ請求項における名 詞のまとまりから名詞を取得した方が勝っている。このことから、単に名詞を取得する のみでなぐ請求項における名詞のまとまりから名詞を取得することが有効であること がわカゝる。 FIG. 7 is a table showing the results of this experimental example. In Fig. 7, the ID is set to identify each experimental example. The ID6 experiment example is a baseline experiment example in other experiment examples, in which nouns that co-occur with the term information frequently are extracted and the nouns are used as related terms. ID6 experimental example that is the baseline Acquisition of related terms from the past This is the method used in As can be seen from the table in Fig. 7, when comparing the experimental example of ID1 with the experimental example of ID2, it is better to obtain a noun from a group of nouns in the claim than simply acquiring a noun. I understand that. Also, comparing the experimental example of ID3 with the experimental example of ID4, it is better to obtain a noun from a group of nouns in a claim that simply needs to obtain a noun. From this, it is clear that it is effective to obtain nouns from a group of nouns in a claim that simply obtains nouns.
[0121] 次に、請求項間の関係を考慮することが有効であるかどうかについて検討する。 ID 2, 4, 5の実験結果を比較すると、第 1請求項とその従属請求項を用いた ID5の実験 結果が最もすぐれており、第 1請求項しか用いな 、ID2の実験結果が最も悪 、結果と なった。 ID2の実験結果が悪い原因は、引用文献の一部を取得する箇所の制限が 強すぎ、ノイズが減るだけでなぐ取得できた正解数も減ったことである。全請求項を 使った ID3の実験例は、 ID2の実験例と比べると抽出できた正解の数は大きいもの の、不正解のものも数多く抽出してしまっているため、 ID5の実験例に劣る結果となつ ている。  Next, it is examined whether it is effective to consider the relationship between claims. Comparing the experimental results of IDs 2, 4, and 5, the experimental result of ID5 using the first claim and its dependent claims is the best, and the experimental result of ID2 is the worst when only the first claim is used. The result was. The reason why ID2's experimental results are bad is that the number of correct answers that can be obtained by reducing the noise is too small, and the number of correct answers that can be obtained is reduced. The ID3 experiment example using all the claims is inferior to the ID5 experiment example because the number of correct answers extracted is larger than the ID2 experiment example, but many incorrect answers are extracted. It is becoming a result.
[0122] 各手法において、 1つの特許公開公報から名詞を抜き出す際に、請求項をいくつ 用いているの力調べたところ、図 8で示される結果となった。 ID5の実験例は、 ID4の 実験例と比べて取得対象となる請求項の数力 割未満であるにもかかわらず、精度 力 D4の実験例よりも高くなつて 、ることから、第 1請求項とその従属請求項内に高!、 確率で正解が含まれて 、ることがわ力る。  [0122] In each method, when extracting the noun from one patent publication, the number of claims used was investigated, and the result shown in Fig. 8 was obtained. The ID5 experimental example is higher than the D4 experimental example even though the ID4 experimental example is less than the power of the claims to be acquired compared to the ID4 experimental example. It is difficult to say that a term and its dependent claims are high!
[0123] 以上のように、本実施の形態による関連用語取得装置 1によれば、引用文献情報 の一部を取得することによって、効果的な関連用語情報の取得を行うことができうる。 したがって、受け付けられた用語情報に関連する関連用語情報を適切に取得するこ とができる関連用語取得装置 1を提供することができる。  [0123] As described above, according to the related term acquisition device 1 according to the present embodiment, it is possible to acquire effective related term information by acquiring a part of the cited document information. Therefore, it is possible to provide the related term acquisition device 1 that can appropriately acquire the related term information related to the accepted term information.
[0124] なお、引用文献情報取得部 14が取得する引用文献情報の一部は、関連用語情報 を適切に取得できる部分であれば、どのような部分であってもよい。例えば、引用文 献情報が特許文書を示す情報であり、引用文献情報取得部 14が取得する、あらかじ め決められている項目に対応する引用文献情報の部分は、独立請求項の部分であ つてもよい。独立請求項とは、他の請求項に従属しない請求項のことである。引用文 献情報取得部 14は、ある請求項が独立請求項であるかどうかを、例えば、次のように して判断してもよい。引用文献情報取得部 14は、注目している請求項に他の請求項 を参照する記載、すなわち、「請求項 N」の記載が存在するかどうか判断し、存在する 場合には、その注目している請求項が独立請求項でないと判断し、存在しない場合 には、その注目している請求項が独立請求項であると判断する。ここで、 「N」は 1以 上の任意の整数である。また、例えば、引用文献情報が特許文書を示す情報であり 、引用文献情報取得部 14が取得する、あら力じめ決められている項目に対応する引 用文献情報の部分は、発明の効果を記載している部分であってもよい。発明の効果 を記載している部分は、例えば、項目「発明の効果」に対応する部分であってもよぐ 文書の内容力 発明の効果が記載されていると判断される部分であってもよい。後者 の場合に、その発明の効果を記載している部分を特定する方法としては、例えば、「 以上のように」か「このように」力も始まり、「効果」、「可能」、「できる」のいずれかの言 葉が含まれているパラグラフを発明の効果を記載している部分として特定する方法な どがある。また、引用文献情報が特許文書を示す情報であり、引用文献情報取得部 14が取得する、あら力じめ決められている項目に対応する引用文献情報の部分は、 請求項の前提部以外の部分であってもよい。請求項の前提部とは、請求項のいわゆ るプリアンブルと呼ばれる部分であって、一般に、請求項中の「において」や「であつ て」よりも以前に記載される部分である。したがって、請求項の前提部以外の部分と は、請求項の特徴的な記載のなされている部分であって、一般に、請求項中の「に お 、て」や「であって」よりも後に記載されて 、る部分である。請求項にプリアンブルが 記載されて ヽな ヽ場合には、請求項の全体が前提部以外の部分となる。 [0124] It should be noted that a part of the cited document information acquired by the cited document information acquiring unit 14 may be any part as long as the related term information can be appropriately acquired. For example, the cited document information is information indicating a patent document, and the cited document information part corresponding to the predetermined item acquired by the cited document information acquiring unit 14 is a part of the independent claim. May be. An independent claim is a claim that is not dependent on other claims. The cited document information acquisition unit 14 may determine whether a certain claim is an independent claim, for example, as follows. The cited document information acquisition unit 14 determines whether or not there is a description referring to another claim in the claim of interest, that is, the description of “Claim N”. It is determined that the current claim is not an independent claim, and if it does not exist, it is determined that the target claim is an independent claim. Here, “N” is an arbitrary integer of 1 or more. Further, for example, the cited document information is information indicating a patent document, and the portion of the cited document information corresponding to the item decided by the cited document information acquisition unit 14 that has been determined in advance has the effect of the invention. The described part may be used. The portion describing the effect of the invention may be, for example, the portion corresponding to the item “effect of the invention” or the content of the document. Good. In the latter case, as a method of specifying the portion describing the effect of the invention, for example, “as above” or “in this way” force starts, “effect”, “possible”, “can” For example, there is a method of identifying a paragraph including any of the above words as a part describing the effect of the invention. In addition, the cited document information is information indicating a patent document, and the cited document information part corresponding to the item that is determined in advance by the cited document information acquiring unit 14 is other than the premise part of the claims. It may be a part. The premise part of a claim is a part called a preamble of a claim, and is generally a part described before “in” or “in” in a claim. Accordingly, the part other than the premise part of the claim is a part in which the characteristic description of the claim is made, and is generally later than “to” or “to” in the claim. It is a part that is described. If the preamble is described in the claim, the entire claim becomes a part other than the premise part.
[0125] (実施の形態 2) [0125] (Embodiment 2)
本発明の実施の形態 2による関連用語取得装置について、図面を参照しながら説 明する。本実施の形態による関連用語取得装置は、引用文献構造情報取得部をさら に備えたものである。  A related term acquisition apparatus according to Embodiment 2 of the present invention will be described with reference to the drawings. The related term acquisition device according to the present embodiment further includes a cited document structure information acquisition unit.
[0126] 図 9は、本実施の形態による関連用語取得装置 2の構成を示すブロック図である。  FIG. 9 is a block diagram showing a configuration of the related term acquisition device 2 according to the present embodiment.
図 9において、本実施の形態による関連用語取得装置 2は、文献情報記憶部 11と、 用語情報受付部 12と、文献情報取得部 13と、引用文献情報取得部 14と、関連用語 情報取得部 15と、関連用語情報出力部 16と、引用文献構造情報取得部 31とを備え る。引用文献構造情報取得部 31以外の構成及び動作は、実施の形態 1と同様であ り、その説明を省略する。 In FIG. 9, the related term acquisition device 2 according to the present embodiment includes a document information storage unit 11, The term information receiving unit 12, the document information acquiring unit 13, the cited document information acquiring unit 14, the related term information acquiring unit 15, the related term information output unit 16, and the cited document structure information acquiring unit 31 are provided. Configurations and operations other than the cited document structure information acquisition unit 31 are the same as those in the first embodiment, and a description thereof will be omitted.
[0127] 引用文献構造情報取得部 31は、文献情報記憶部 11で記憶されている引用文献 情報の 1以上の部分と、その部分の項目との対応を示す情報である引用文献構造情 報を取得する。引用文献構造情報は、項目と、その項目に対応する引用文献情報の 部分とを対応付けることができる情報であれば、その情報の構造は問わない。例えば 、引用文献構造情報は、タグによって項目と、その項目に対応する部分とを示しても よい。すなわち、引用文献情報において、く効果〉と、く Z効果〉とで挟まれた領域が、 項目「効果」に対応する部分であってもよい。この場合には、引用文献構造情報は、 文献情報記憶部 11で記憶されてもよい。また、例えば、引用文献構造情報は、項目 「効果」に対応付けて、その項目「効果」に対応する引用文献情報の部分を示す情報 を有してもよい。この場合には、引用文献構造情報は、文献情報記憶部 11で引用文 献情報に対応付けられて保持されてもよぐあるいは、引用文献構造情報取得部 31 や他の構成要素が有する図示しな!ヽ記録媒体にお!ヽて保持されてもよ!ヽ。後者の場 合には、引用文献識別情報に対応付けて引用文献構造情報を保持することが好適 である。引用文献情報の部分を示す情報とは、例えば、引用文献情報の部分の先端 の位置と、後端の位置とを示す情報であってもよい。先端や後端の位置は、例えば、 引用文献情報の先端力ものバイト数で示されてもよぐあるいは、ページ数と行数な どで示されてもよい。 The cited document structure information acquisition unit 31 obtains cited document structure information that is information indicating correspondence between one or more parts of the cited document information stored in the document information storage unit 11 and items of the parts. get. The cited document structure information may be any information structure as long as the item can be associated with a portion of the cited document information corresponding to the item. For example, the cited document structure information may indicate an item by a tag and a portion corresponding to the item. That is, in the cited document information, a region sandwiched between <effect> and <Z effect> may be a portion corresponding to the item “effect”. In this case, the cited document structure information may be stored in the document information storage unit 11. Further, for example, the cited document structure information may have information indicating the portion of the cited document information corresponding to the item “effect” in association with the item “effect”. In this case, the cited document structure information may be held in the document information storage unit 11 in association with the cited document information, or the cited document structure information acquisition unit 31 and other components are illustrated. No, you can hold it on the recording medium!ヽ. In the latter case, it is preferable to store the cited document structure information in association with the cited document identification information. The information indicating the cited document information part may be information indicating the position of the leading end and the position of the trailing end of the cited document information part, for example. The positions of the leading edge and trailing edge may be indicated by, for example, the number of bytes of the leading force of the cited document information, or may be indicated by the number of pages and the number of lines.
[0128] ここで、引用文献構造情報取得部 31がある項目に対応する部分をどのように特定 するのかについて説明する。例えば、引用文献構造情報取得部 31は、特定すべき 各項目に対応する条件を示す条件情報を図示しな!ヽ記録媒体にお!ヽて保持して 、 る。そして、その条件情報に対応する引用文献情報の部分を特定し、その部分に項 目を対応付けることによって引用文献構造情報を取得する。例えば、項目「効果」に 対応する条件情報が、「以上のように」か「このように」からパラグラフが始まり、そのパ ラグラフに「効果」、「可能」、「できる」のいずれかの言葉が含まれていることであると する。すると、引用文献構造情報取得部 31は、その条件にあうパラグラフを特定し、 そのパラグラフを項目「効果」に対応付けることによって引用文献構造情報を取得す ることができる。また、例えば、項目「請求項の特徴的記載」に対応する条件情報が、 請求項に含まれている部分であって、「であって」や「において」に続く部分であり、句 点「。」で終わる部分であるとする。すると、引用文献構造情報取得部 31は、その条 件にあう部分を特定し、その部分を項目「請求項の特徴的記載」に対応付けることに よって引用文献構造情報を取得することができる。なお、請求項の特徴的記載を取 得する方法などについては、前述の文献『新森昭宏、奥村学、丸川雄三、岩山真、「 手がかり句を用いた特許請求項の構造解析」、情報処理学会論文誌、 Vol. 45、 No . 3、 p. 891— 905、 2004年』【こ記載されて!ヽる。 [0128] Here, how to specify a part corresponding to an item in the cited document structure information acquisition unit 31 will be described. For example, the cited document structure information acquisition unit 31 holds the condition information indicating the condition corresponding to each item to be specified in a not-shown recording medium. Then, the portion of the cited document information corresponding to the condition information is specified, and the cited document structure information is acquired by associating the item with the portion. For example, the condition information corresponding to the item “Effect” starts with a paragraph starting from “As above” or “Like this”, and the word “Effect”, “Possible”, or “Yes” appears in the paragraph. To be included To do. Then, the cited document structure information acquisition unit 31 can acquire the cited document structure information by specifying a paragraph that satisfies the condition and associating the paragraph with the item “effect”. In addition, for example, the condition information corresponding to the item “characteristic description of claim” is a part included in the claim, which is a part that follows “is” and “in”. ”. Then, the cited document structure information acquisition unit 31 can acquire the cited document structure information by specifying a portion that meets the condition and associating the portion with the item “characteristic description of claim”. Regarding the method of obtaining the characteristic description of claims, the above-mentioned documents “Akihiro Niimori, Manabu Okumura, Yuzo Marukawa, Makoto Iwayama,“ Structural analysis of patent claims using clue phrases ”, Information Processing Society of Japan Journal, Vol. 45, No. 3, p. 891-905, 2004 ”[This is listed!
[0129] 次に、本実施の形態による関連用語取得装置 2の全体動作について、図 10のフロ 一チャートを用いて説明する。なお、ステップ S501、 S502以外の処理は、実施の形 態 1の図 3のフローチャートと同様であり、その説明を省略する。ただし、ステップ S10 3において、引用文献情報取得部 14は、引用文献構造情報取得部 31が取得した引 用文献構造情報を用いて、引用文献情報の一部であって、あらかじめ決められてい る項目に対応する部分を取得するものとする。  Next, the overall operation of the related term acquisition apparatus 2 according to the present embodiment will be described using the flowchart of FIG. The processes other than steps S501 and S502 are the same as those in the flowchart of FIG. 3 in the first embodiment, and a description thereof will be omitted. However, in step S103, the cited document information acquisition unit 14 uses the cited document structure information acquired by the cited document structure information acquisition unit 31, and is a part of the cited document information that is determined in advance. The part corresponding to is acquired.
[0130] (ステップ S501)引用文献構造情報取得部 31は、引用文献構造情報をすでに取 得しているかどうか判断する。そして、取得していない場合には、ステップ S502に進 み、すでに取得している場合には、ステップ S102に進む。  (Step S501) The cited document structure information acquisition unit 31 determines whether or not the cited document structure information has already been acquired. If it has not been acquired, the process proceeds to step S502. If it has already been acquired, the process proceeds to step S102.
[0131] (ステップ S502)引用文献構造情報取得部 31は、文献情報記憶部 11で記憶され ている引用文献情報のすべてについて、引用文献構造情報を取得する。ここで、文 献情報記憶部 11で記憶されて 、る文献情報のうち、 V、ずれが引用文献情報である のかについては、文献情報を取得するまではわ力もないため、引用文献構造情報取 得部 31は、文献情報記憶部 11で記憶されて 、るすべての文献情報について引用 文献構造情報を取得するものとする。  (Step S502) The cited document structure information acquisition unit 31 acquires the cited document structure information for all of the cited document information stored in the document information storage unit 11. Here, in the document information stored in the document information storage unit 11, it is insignificant until the document information is acquired as to whether V or the deviation is the cited document information. The obtaining unit 31 acquires the cited document structure information for all the document information stored in the document information storage unit 11.
[0132] なお、図 10のフローチャートでは、引用文献構造情報を一括して取得する場合に ついて説明したが、ある引用文献情報の一部を取得する場合であって、その引用文 献情報の弓 I用文献構造情報がまだ取得されて 、な 、場合に、その引用文献情報の 引用文献構造情報のみを取得するようにしてもよい。このようにする方力 不要な引 用文献構造情報の取得を行わないこととなり、処理負荷が軽減されることになる。 [0132] In the flowchart of Fig. 10, the case where the cited document structure information is acquired collectively has been described. However, in the case where a part of the cited document information is acquired, the bow of the cited document information is obtained. In the case where the literature structure information for I has not yet been acquired, Only the cited document structure information may be acquired. In this way, unnecessary reference document structure information is not acquired, and the processing load is reduced.
[0133] また、引用文献構造情報の取得は、関連用語取得装置 2において関連用語の取 得処理を行っていないとき、すなわち、関連用語取得装置 2における処理負荷の大 きくな 、ときにあら力じめ行って 、てもよ!/、。  [0133] Further, the citation structure information is acquired when the related term acquisition device 2 does not perform related term acquisition processing, that is, when the processing load on the related term acquisition device 2 is large. Go ahead and do it! /.
[0134] このように、本実施の形態による関連用語取得装置 2によれば、引用文献構造情報 を用いて引用文献情報の一部を取得するため、引用文献情報において文献の構造 が顕在化していない場合であっても、引用文献構造情報を用いることによって、引用 文献の所望の部分の取得を容易に行うことができうることになる。  As described above, according to the related term acquisition device 2 according to the present embodiment, since a part of the cited document information is acquired using the cited document structure information, the structure of the document is manifested in the cited document information. Even if there is not, it is possible to easily obtain a desired portion of the cited document by using the cited document structure information.
[0135] (実施の形態 3)  [Embodiment 3]
本発明の実施の形態 3による関連用語取得装置について、図面を参照しながら説 明する。本実施の形態による関連用語取得装置は、文献情報における引用文献の 引用形式に応じた項目に対応する引用文献情報の一部を取得するものである。  A related term acquisition apparatus according to Embodiment 3 of the present invention will be described with reference to the drawings. The related term acquisition device according to the present embodiment acquires a part of cited document information corresponding to an item corresponding to the cited format of the cited document in the document information.
[0136] 本実施の形態による関連用語取得装置の構成は、実施の形態 1の図 1,図 2で示さ れるものと同様であり、その説明を省略する。なお、文献情報取得部 13は、文献情報 の一部を取得する場合に、引用文献識別情報を少なくとも有する部分であって、引 用情報を少なくとも有する部分を取得する。また、引用文献情報取得部 14は、文献 情報取得部 13が取得した文献情報の全部または一部に含まれる弓 I用文献識別情 報で識別される文献を示す情報である引用文献情報の一部であって、文献情報取 得部 13が取得した文献情報の全部または一部に含まれる弓 I用情報における引用形 式に応じてあら力じめ決められている項目に対応する部分を取得する。また、引用文 献情報取得部 14が引用形式に応じてあら力じめ決められている項目に対応する部 分を取得することには、引用形式が所定の形式である場合には、引用文献情報の一 部の取得を行わな 、場合を含むものとする。  The configuration of the related term acquisition device according to the present embodiment is the same as that shown in FIGS. 1 and 2 of Embodiment 1, and the description thereof is omitted. In addition, when acquiring a part of the document information, the document information acquisition unit 13 acquires a portion having at least cited reference identification information and at least including reference information. In addition, the cited document information acquisition unit 14 is a piece of cited document information that is information indicating a document identified by the document identification information for bow I included in all or part of the document information acquired by the document information acquisition unit 13. The part corresponding to the item that is decided according to the quotation format in the information for bow I included in all or part of the document information acquired by the document information acquisition unit 13 is acquired. To do. In addition, when the cited document information acquisition unit 14 acquires a part corresponding to an item determined according to the citation format, the citation document is a predetermined format. In some cases, no information is obtained.
[0137] ここで、引用情報とは、引用文献識別情報で識別される文献についての記述を示 す情報である。文献情報において、例えば、引用情報は引用文献識別情報と同じ領 域に存在してもよぐ引用情報は引用文献識別情報と異なる領域に存在してもよい。 前者の場合としては、例えば、引用文献識別情報が背景技術の領域に記載されて おり、引用文献識別情報そのものによって、引用文献が引用されている場合である。 特許明細書等においては、このような引用がなされている場合がある。一方、後者の 場合としては、例えば、引用文献識別情報は参考文献の欄にまとめて記載されてお り、文献の本文において、その参考文献の欄に記載されている引用文献識別情報に 対応する所定のインデックスによって引用文献が弓 I用されて ヽる場合である。学術文 献においては、このような引用がなされている場合が多い。ここで、インデックスとは、 例えば、 1、 2、 3…や、 1)、 2)、 3)…ゝ [1]、 [2]、 [3]…ゝ [A]、 [B]、 [C]…、特許文 献 1、特許文献 2、特許文献 3· ··、非特許文献 1、非特許文献 2、非特許文献 3· ··等 であり、引用文献識別情報を特定できるものであれば、どのようなものであってもよい 。引用文献識別情報そのものによって、引用文献が引用されている場合には、引用 情報は、その引用文献識別情報を含む情報となる。引用文献識別情報を含む情報 とは、例えば、引用文献識別情報を含む所定の領域の情報である。所定の領域とは 、例えば、文でもよぐノラグラフでもよく、それ以外の何らかの文書のまとまりでもよい 。引用文献識別情報に対応する所定のインデックスによって引用文献が引用されて いる場合には、引用情報は、そのインデックスを含む情報となる。インデックスを含む 情報とは、例えば、インデックスを含む所定の領域の情報である。所定の領域とは、 例えば、文でもよぐノラグラフでもよく、それ以外の何らかの文書のまとまりでもよい。 文献情報において、引用情報と引用文献識別情報とが異なる領域に存在する場合 であって、文献情報取得部 13が文献情報の一部を取得する場合には、文献情報取 得部 13は、両者を含むひとまとまりの文献情報の部分を取得してもよぐあるいは、 両者をそれぞれ含む 2以上の文献情報の部分を取得してもよい。また、文献情報取 得部 13が引用情報を有する文献情報の全部または一部を取得する場合に、その引 用情報は文献情報の全部または一部において顕在化していてもよぐあるいは、顕 在化していなくてもよい。前者の場合には、文献情報の全部または一部において引 用情報の部分が容易にわ力ることになる。 Here, the citation information is information indicating a description of the document identified by the cited document identification information. In the document information, for example, the citation information may exist in the same area as the cited document identification information, and the citation information may exist in a different area from the cited document identification information. In the former case, for example, cited document identification information is described in the background art area. In this case, the cited document is cited by the cited document identification information itself. Such citations may be made in patent specifications and the like. On the other hand, in the latter case, for example, the cited document identification information is collectively described in the reference column, and corresponds to the cited document identification information described in the reference column in the text of the document. This is the case where the cited reference is used for the bow I by the given index. In academic literature, such citations are often made. Here, the index is, for example, 1, 2, 3 ... or 1), 2), 3) ... ゝ [1], [2], [3] ... ゝ [A], [B], [C ] ..., patent document 1, patent document 2, patent document 3, ..., non-patent document 1, non-patent document 2, non-patent document 3, ..., etc., which can identify cited document identification information Anything can be used. When a cited document is cited by the cited document identification information itself, the cited information is information including the cited document identification information. The information including cited document identification information is, for example, information on a predetermined area including cited document identification information. The predetermined area may be, for example, a sentence or a noragraph, or a group of other documents. When a cited document is cited by a predetermined index corresponding to the cited document identification information, the cited information is information including the index. The information including the index is, for example, information on a predetermined area including the index. The predetermined area may be, for example, a sentence or a noragraph, or a group of other documents. In the literature information, when the citation information and the cited document identification information exist in different areas and the literature information acquisition unit 13 acquires a part of the literature information, the literature information acquisition unit 13 It is possible to acquire a part of a set of document information including “”, or to acquire two or more parts of document information including both of them. Further, when the document information acquisition unit 13 acquires all or a part of the document information having the citation information, the reference information may be manifested in all or a part of the document information, or may be manifested. It does not have to be converted. In the former case, the portion of the reference information can be easily saved in all or part of the literature information.
次に、引用情報における引用形式について説明する。引用形式には、例えば、引 用文献の問題を指摘する形式である問題指摘型引用形式 (以下、「タイプ C」の引用 形式と呼ぶこともある)と、引用文献を論説の根拠に取り込む形式である論説根拠型 引用形式 (以下、「タイプ B」の引用形式と呼ぶこともある)と、それら以外の引用形式( 以下、「タイプ 0」の引用形式と呼ぶこともある)とがある。タイプ Bの引用形式は、既存 の研究成果を用いて新し 、理論を提案したり、システムを構築したりする場合に用い られる引用形式である。なお、これら以外の引用形式の区分であってもよい。引用文 献情報取得部 14は、例えば、各引用形式に応じた 1以上の手が力り句を図示しない 記録媒体において保持しておき、引用情報の所定の位置にその手が力り句が含まれ るかどうかによって、各引用形式を決定してもよい。所定の位置とは、例えば、引用情 報の先頭であってもよぐ引用情報の最後であってもよぐ引用情報のすべてであつ てもよい。なお、引用情報から引用形式を決定できるのであれば、これ以外の方法で あってもよい。 Next, the citation format in the citation information will be described. The citation format includes, for example, a problem-pointing citation format (hereinafter also referred to as “type C” citation format) that points out the problem of the cited document, and a format that incorporates the cited document as the basis of the editorial. An editorial basis type There are citation formats (hereinafter also referred to as “Type B” citation formats) and other citation formats (hereinafter also referred to as “Type 0” citation formats). The type B citation format is a citation format used when a new theory is proposed or a system is constructed using existing research results. It should be noted that other citation formats may be used. The cited document information acquisition unit 14 holds, for example, one or more hands according to each citation format in a recording medium (not shown), and the hand has a force phrase at a predetermined position of the quote information. Depending on whether it is included, each citation format may be determined. The predetermined position may be, for example, all of the citation information that may be at the beginning of the citation information or at the end of the citation information. Note that other methods may be used as long as the citation format can be determined from the citation information.
[0139] 次に、本実施の形態による関連用語取得装置 1の動作について、フローチャートを 用いて説明する。本実施の形態による関連用語取得装置 1の全体動作は、実施の形 態 1の図 3のフローチャートと同様であり、その説明を省略する。また、図 3のフローチ ヤートにおけるステップ S102の文献情報の取得処理の詳細について示すフローチ ヤートも、実施の形態 1の図 4のフローチャートと同様であり、その説明を省略する。た だし、図 4のステップ S204において、文献情報取得部 13は、引用情報も取得するも のとする。また、図 3のフローチャートにおけるステップ S104の関連用語情報の取得 処理の詳細について示すフローチャートも、実施の形態 1の図 6のフローチャートと同 様であり、その説明を省略する。  Next, the operation of the related term acquisition device 1 according to the present embodiment will be described using a flowchart. The overall operation of the related term acquisition apparatus 1 according to the present embodiment is the same as that of the flowchart of FIG. The flowchart shown in detail in the flowchart of FIG. 3 for obtaining the document information in step S102 is also the same as the flowchart of FIG. 4 of the first embodiment, and a description thereof will be omitted. However, it is assumed that the document information acquisition unit 13 also acquires citation information in step S204 of FIG. Also, the flowchart showing the details of the related term information acquisition process of step S104 in the flowchart of FIG. 3 is the same as the flowchart of FIG. 6 of the first embodiment, and a description thereof will be omitted.
[0140] 図 11は、図 3のフローチャートにおけるステップ S103の引用文献の取得処理の詳 細について示すフローチャートである。なお、ステップ S601〜S603以外の処理は、 実施の形態 1の図 5のフローチャートと同様であり、その説明を省略する。  [0140] FIG. 11 is a flowchart showing details of the cited document acquisition processing in step S103 in the flowchart of FIG. The processes other than steps S601 to S603 are the same as those in the flowchart of FIG. 5 of the first embodiment, and the description thereof is omitted.
[0141] (ステップ S601)引用文献情報取得部 14は、文献情報取得部 13が取得した i番目 の文献情報に含まれるすべての引用文献識別情報と、その引用文献識別情報に対 応する引用情報とを取得する。なお、文献情報取得部 13が取得した文献情報の全 部または一部において、引用情報の部分が顕在化していない場合には、文献情報 取得部 13による引用情報の取得処理での説明と同様にして、引用文献情報取得部 14は、引用情報の部分を特定することができる。 [0142] (ステップ S602)引用文献情報取得部 14は、 j番目の引用文献識別情報に対応す る引用情報における引用形式を決定する。引用形式を決定するとは、例えば、 j番目 の引用文献識別情報と、弓 I用形式を示す情報とを対応付けて図示しな!ヽ記録媒体 に蓄積することであってもよい。この処理の詳細については、図 12のフローチャート を用いて後述する。 [0141] (Step S601) The cited document information acquisition unit 14 includes all the cited document identification information included in the i-th document information acquired by the document information acquisition unit 13 and the cited information corresponding to the cited document identification information. And get. If the citation information part is not obvious in all or part of the literary information acquired by the literary information acquisition unit 13, it is the same as described in the citation information acquisition process by the literary information acquisition unit 13. Thus, the cited document information acquisition unit 14 can identify the portion of the cited information. [0142] (Step S602) The cited document information acquisition unit 14 determines the citation format in the cited information corresponding to the j-th cited document identification information. The citation format may be determined, for example, by storing the j-th cited document identification information and information indicating the bow I format in association with each other in a not-shown recording medium. Details of this processing will be described later using the flowchart of FIG.
[0143] (ステップ S603)引用文献情報取得部 14は、ステップ S601で取得した j番目の引 用文献識別情報で識別される弓 I用文献情報を文献情報記憶部 11から検索し、その 検索された引用文献情報の一部であって、ステップ S602で決定した引用形式に対 応する部分を文献情報記憶部 11から取得する。  [0143] (Step S603) The cited document information acquisition unit 14 searches the document information storage unit 11 for the document information for the bow I identified by the jth reference document identification information acquired in Step S601. A part corresponding to the citation format determined in step S602 is acquired from the document information storage unit 11 as a part of the cited document information.
[0144] 図 12は、図 11のフローチャートにおけるステップ S602の引用形式の決定処理の 詳細について示すフローチャートである。なお、前述のように、引用文献情報取得部 14は、各引用形式に応じた 1以上の手が力り句を図示しない記録媒体において保持 しているちのとする。  FIG. 12 is a flowchart showing details of the citation format determination process in step S602 in the flowchart of FIG. As described above, it is assumed that the cited document information acquisition unit 14 holds one or more hands corresponding to each citation format in a recording medium (not shown).
[0145] (ステップ S701)引用文献情報取得部 14は、カウンタ kを 1に設定する。  (Step S 701) The cited document information acquisition unit 14 sets a counter k to 1.
(ステップ S702)引用文献情報取得部 14は、引用情報がタイプ Cの k番目の手が 力り句を含む力どうか判断する。そして、含む場合には、ステップ S703に進み、そう でない場合には、ステップ S704に進む。  (Step S702) The cited document information obtaining unit 14 determines whether or not the k-th hand whose citation information is type C includes a force phrase. If so, the process proceeds to step S703, and if not, the process proceeds to step S704.
[0146] (ステップ S703)引用文献情報取得部 14は、引用形式をタイプ Cに決定する。そし て、一連の処理は終了となり、図 11のフローチャートのステップ S603に進む。 (Step S 703) The cited document information acquisition unit 14 determines the citation format as type C. Then, the series of processing ends, and the process proceeds to step S603 in the flowchart of FIG.
(ステップ S704)引用文献情報取得部 14は、カウンタ kを 1だけインクリメントする。  (Step S704) The cited document information acquisition unit 14 increments the counter k by 1.
[0147] (ステップ S705)引用文献情報取得部 14は、 k番目のタイプ Cの手力かり句が存在 するかどうか判断する。そして、存在する場合には、ステップ S702に戻り、存在しな ヽ場合に【ま、ステップ S706に進む。 (Step S 705) The cited document information acquisition unit 14 determines whether or not the k-th type C cue phrase exists. If it exists, the process returns to step S702. If it does not exist, the process proceeds to step S706.
(ステップ S706)引用文献情報取得部 14は、カウンタ kを 1に設定する。  (Step S706) The cited document information acquisition unit 14 sets the counter k to 1.
[0148] (ステップ S707)引用文献情報取得部 14は、引用情報がタイプ Bの k番目の手が 力り句を含む力どうか判断する。そして、含む場合には、ステップ S708に進み、そう でない場合には、ステップ S709に進む。 (Step S707) The cited document information acquisition unit 14 determines whether or not the k-th hand whose citation information is type B includes a power phrase. If it is included, the process proceeds to step S708; otherwise, the process proceeds to step S709.
[0149] (ステップ S708)引用文献情報取得部 14は、引用形式をタイプ Bに決定する。そし て、一連の処理は終了となり、図 11のフローチャートのステップ S603に進む。 (Step S 708) The cited document information acquisition unit 14 determines the citation format as type B. And Thus, the series of processing ends, and the process proceeds to step S603 in the flowchart of FIG.
(ステップ S709)引用文献情報取得部 14は、カウンタ kを 1だけインクリメントする。  (Step S709) The cited document information acquisition unit 14 increments the counter k by 1.
[0150] (ステップ S710)引用文献情報取得部 14は、 k番目のタイプ Bの手力 Sかり句が存在 するかどうか判断する。そして、存在する場合には、ステップ S707に戻り、存在しな ヽ場合に【ま、ステップ S711に進む。  [0150] (Step S710) The cited document information acquisition unit 14 determines whether or not the k-th type B manual strength phrase exists. If it exists, the process returns to step S707. If it does not exist, the process proceeds to step S711.
[0151] (ステップ S711)引用文献情報取得部 14は、引用形式をタイプ Oに決定する。そし て、一連の処理は終了となり、図 11のフローチャートのステップ S603に進む。  (Step S711) The cited document information acquisition unit 14 determines the citation format as type O. Then, the series of processing ends, and the process proceeds to step S603 in the flowchart of FIG.
[0152] 次に、本実施の形態による関連用語取得装置 1の動作について、具体例を用いて 説明する。この具体例において、文献情報記憶部 11は、 Postscript及び PDF形式 の自然言語処理分野を中心とするフルテキスト論文 (文献情報)約 12000件を記憶 している。これらのうち、約 8000件は、 ACLが提供する ACL Anthologyに含まれ るものであり、残りの約 4000件は、国内外の自然言語処理研究者や自然言語処理 系研究室の Webページ力も収集したものや、国際会議の予稿集 (CD-ROM)から 抽出した論文データ (文献情報)等である。このように、この具体例における文献情報 記憶部 11で記憶されて!、る文献情報は、学術論文と!/、う一種類の文献情報である。  [0152] Next, the operation of the related term acquisition apparatus 1 according to the present embodiment will be described using a specific example. In this specific example, the bibliographic information storage unit 11 stores about 12000 full-text papers (bibliographic information) centered on natural language processing fields in Postscript and PDF format. Of these, approximately 8,000 are included in ACL Anthology provided by ACL, and the remaining approximately 4,000 are also collected from the web pages of natural language processing researchers and natural language processing laboratories in Japan and overseas. And paper data (bibliographic information) extracted from the proceedings of the international conference (CD-ROM). Thus, the document information stored in the document information storage unit 11 in this specific example is an academic paper and! /, Another kind of document information.
[0153] また、この具体例において、文献情報取得部 13は、用語情報受付部 12が受け付 けた用語情報の示す用語がタイトルに含まれる文献情報から、その文献情報の参考 文献の欄の情報と、対応する引用情報とを取得するものとする。  [0153] Also, in this specific example, the document information acquisition unit 13 obtains information in the column of the reference document of the document information from the document information in which the term indicated by the term information received by the term information reception unit 12 is included in the title And the corresponding citation information.
[0154] また、この具体例において、引用文献情報取得部 14は、タイプ Cに応じた手がかり 句を有する情報であるタイプ C用手が力り句辞書と、タイプ Bに応じた手が力り句を有 する情報であるタイプ B用手が力り句辞書とを図示しない記録媒体において保持して いる。図 13は、タイプ C用手がかり句辞書の一例を示す図である。図 14は、タイプ B 用手がかり句辞書の一例を示す図である。引用文献情報取得部 14は、引用情報に タイプ C用手力かり句辞書のいずれかの手力かり句が含まれる場合に引用形式がタ イブ Cであると決定し、引用情報にタイプ B用手が力り句辞書のいずれかの手がかり 句が含まれる場合に引用形式がタイプ Bであると決定する。  [0154] Also, in this specific example, the cited document information acquisition unit 14 uses the hand phrase corresponding to type C, which is information having a clue phrase corresponding to type C, and the hand corresponding to type B. A type B hand, which is information having phrases, holds a force phrase dictionary on a recording medium (not shown). FIG. 13 is a diagram showing an example of a type C clue phrase dictionary. FIG. 14 is a diagram showing an example of a type B cue phrase dictionary. The cited document information acquisition unit 14 determines that the citation format is type C when any of the clue phrases in the type C clue phrase dictionary is included in the citation information, and the citation information is for type B. If the hand contains any clue phrase in the force phrase dictionary, the citation form is determined to be type B.
[0155] また、この具体例において、引用文献情報取得部 14は、引用文献の一部として、 引用文献のタイトルを取得するものとする。 まず、ユーザが用語情報「terminol0gy」を関連用語取得装置 1に入力したとする[0155] In this specific example, the cited document information acquiring unit 14 acquires the title of the cited document as a part of the cited document. First, it is assumed that the user inputs the term information “ term i no l 0 gy” to the related term acquisition device 1.
。すると、その用語情報「terminol0gy」は用語情報受付部 12で受け付けられる (ス テツプ S101)。そして、文献情報取得部 13は、その用語情報「terminol0gy」をタイ トルに含む各文献情報の参考文献 (Reference)の欄の部分と、引用情報とを取得 する(ステップ S 102、 S201〜S205)。 . Then, the term information "t er minol 0 gy" are accepted by the term information receiving section 12 (scan Tetsupu S101). Then, the document information acquisition unit 13 acquires the reference field and reference information of each document information including the term information “terminol 0 gy” in the title (steps S 102, S 201 to S 201). S205).
[0156] 具体的には、文献情報取得部 13は、用語情報「terminology」をタイトルに含む各 文献情報を検索し、その検索された文献情報の書誌情報を取得する (ステップ S201 )。図 15は、その取得された文献情報の書誌情報の一例を示す図である。図 15で示 されるように、その書誌情報には「著者」、「タイトル」、「その他」が含まれる。「著者」は 論文の著者(1名以上)の情報であり、「タイトル」は論文のタイトルの情報であり、「そ の他」は論文誌の名称や著作年などの情報である。なお、図 15において、各レコード に「ID」が対応付けられている力 この「ID」は、表におけるレコード管理のために存 在する。 Specifically, the document information acquiring unit 13 searches each document information including the term information “terminology” in the title, and acquires the bibliographic information of the searched document information (step S201). FIG. 15 is a diagram showing an example of the bibliographic information of the acquired document information. As shown in Fig. 15, the bibliographic information includes "author", "title", and "other". “Author” is information about the author (s) of the paper, “Title” is information about the title of the paper, and “Other” is information such as the name and year of the journal. Note that in FIG. 15, “ID” is associated with each record. This “ID” exists for record management in the table.
[0157] 文献情報取得部 13は、図 15で示される各レコードで識別される文献情報から、参 考文献 (Reference)の欄の部分と、その参考文献の欄に含まれる I用文献識別情 報に対応する弓 I用情報とを取得する (ステップ S 204)。  [0157] The document information acquisition unit 13 uses the document information identified in each record shown in Fig. 15 to determine the reference document (Reference) column part and the I document identification information included in the reference document column. Information for bow I corresponding to the information is acquired (step S204).
[0158] ここで、この具体例における引用情報の取得方法について説明する。文献情報取 得部 13は、前述のように、参考文献の欄に記載されている引用文献識別情報に対 応する所定のインデックスを参考文献の欄から取得し、そのインデックスを含む文を 検索する。次に、文献情報取得部 13は、そのインデックスを含む文とつながりの深い 文を以下の(1)〜(6)の手が力り句を用いて抽出する。すなわち、文献情報取得部 1 3は、インデックスを含む文の次の文に、以下の(1)〜(6)の手がかり句の!/、ずれか が含まれる場合に、その手がカゝり句の含まれる文がインデックスを含む文とつながり が深い文であるとして抽出する。文献情報取得部 13は、インデックスを含む文と、そ のインデックスを含む文とつながりの深 、文とを引用情報として取得する。  [0158] Here, a method of acquiring citation information in this specific example will be described. As described above, the document information acquiring unit 13 acquires a predetermined index corresponding to the cited document identification information described in the reference column from the reference column, and searches for a sentence including the index. . Next, the document information acquisition unit 13 extracts sentences (1) to (6) that are deeply connected to the sentence including the index using the following power phrases. In other words, the document information acquisition unit 13 3 culls the hand if the sentence following the sentence including the index includes the following clue phrases (1) to (6)! The sentence containing the phrase is extracted as a sentence closely related to the sentence containing the index. The document information acquisition unit 13 acquires the sentence including the index, the sentence including the index, the depth of connection, and the sentence as citation information.
[0159] (1)照応詞に関する手力 Sかり: In this, On this, Such  [0159] (1) Manual power for anaphor S: In this, On this, Such
(2)接続詞に関する手力 Sかり: But、 However, Although  (2) Skills related to conjunctions S: But, However, Although
(3) 1人称に関する手がかり: We、 we、 Our, our, usゝ I (4) 3人称に関する手がかり: They、 they, Their, their, them (3) First person clues: We, we, Our, our, us ゝ I (4) Third person clues: They, they, Their, their, them
(5)副詞に関する手がかり: Furthermore Additionally, Still  (5) clues about adverbs: additionally additionally, Still
(6)その他の手がかり: In particular, follow, For example  (6) Other clues: In particular, follow, For example
[0160] 次に、引用文献情報取得部 14は、文献情報取得部 13が取得した文献情報の一 部 (すなわち、参考文献の部分と各引用文献識別情報に対応する引用情報)に基づ いて、引用形式を決定し、引用形式に応じた引用文献の部分を取得する (ステップ S 103、図 11、図 12のフローチャート)。  Next, the cited document information acquisition unit 14 is based on a part of the document information acquired by the document information acquisition unit 13 (that is, the citation information corresponding to the reference document part and each cited document identification information). Then, the citation format is determined, and the portion of the cited document corresponding to the citation format is acquired (step S103, flowcharts of FIGS. 11 and 12).
[0161] 引用形式の決定では、前述のように、図 13で示されるタイプ C用手がかり句辞書の いずれかの手が力り句が引用情報に含まれる場合には、引用形式をタイプ Cに決定 し、図 14で示されるタイプ B用手が力り句辞書のいずれかの手が力り句が引用情報 に含まれる場合には、引用形式をタイプ Bに決定し、タイプ C、タイプ Bのいずれでも ない場合には、引用形式をタイプ Oに決定する (ステップ S701〜S711)。そして、引 用文献情報取得部 14は、引用形式がタイプ Cである場合には、引用文献情報の一 部として、引用文献情報のタイトルを示す情報を取得する。一方、引用文献情報取得 部 14は、引用形式がタイプ Bである場合には、引用文献情報の一部として、引用文 献情報の要約 (アブストラクト)を示す情報を取得する。なお、引用文献情報取得部 1 4は、引用形式がタイプ Oである場合には、引用文献情報の一部の取得を行わない  [0161] When determining the citation format, as described above, if any hand in the type C cue phrase dictionary shown in Figure 13 contains a force phrase, the citation format is changed to type C. If the citation information includes a force phrase in one of the hand phrase dictionary type B shown in Fig. 14, the citation format is determined as type B, type C, type B If it is neither of these, the citation format is determined as type O (steps S701 to S711). Then, when the citation format is type C, the cited document information acquisition unit 14 acquires information indicating the title of the cited document information as a part of the cited document information. On the other hand, when the citation format is type B, the cited document information acquisition unit 14 acquires information indicating a summary (abstract) of the cited document information as part of the cited document information. The cited document information acquisition unit 14 does not acquire a part of the cited document information when the citation format is type O.
[0162] この後、関連用語情報出力部 16は、引用文献情報取得部 14によって取得された 引用文献情報の一部の集合から関連用語情報を取得し (ステップ S104、図 6のフロ 一チャート)、関連用語情報出力部 16は、その取得された関連用語情報を出力する (ステップ S 105)。 [0162] Thereafter, the related term information output unit 16 acquires related term information from a partial set of cited reference information acquired by the cited reference information acquisition unit 14 (step S104, flowchart of FIG. 6). The related term information output unit 16 outputs the acquired related term information (step S105).
[0163] なお、引用情報を取得する方法が、上述した具体的な説明に限定されないことは 言うまでもない。ここでは、引用情報を取得する他の方法について説明する。文献情 報取得部 13は、例えば、前述のように、参考文献の欄に記載されている引用文献識 別情報に対応する所定のインデックスを参考文献の欄から取得し、そのインデックス を含む文を検索し、そのインデックスが含まれる文を特定する。そして、文献情報取 得部 13は、そのインデックスの含まれる文と、その文と関連の深い文とを引用情報と して取得する。ここで、インデックスの含まれる文と関連の深い文を特定する方法につ いて説明する。 [0163] Needless to say, the method for obtaining the citation information is not limited to the specific description described above. Here, another method for acquiring citation information will be described. For example, as described above, the document information acquisition unit 13 acquires a predetermined index corresponding to the cited document identification information described in the reference column from the reference column, and reads a sentence including the index. Search and identify the sentence that contains the index. Then, the literature information acquisition unit 13 uses the sentence including the index and the sentence closely related to the sentence as citation information. And get. Here, we explain how to identify sentences that are closely related to the sentence that contains the index.
文献情報取得部 13は、あら力じめ図示しない記録媒体で記憶されている下記の 2 種類の手力 Sかり語句にアクセス可能であるとする。  It is assumed that the document information acquisition unit 13 can access the following two types of hand-powered phrases stored on a recording medium (not shown).
(A)前拡張用手が力り語句  (A) Hands for front extension
In this, On this, ¾uch, But, However- ··  In this, On this, ¾uch, But, However-
(B)後拡張用手がかり語句  (B) Post expansion clue phrases
We, we, They, they, Our, our, In this, On this, Such, But, However- - - We, we, They, they, Our, our, In this, On this, Such, But, However---
[0165] 文献情報取得部 13は、まず、インデックスの含まれる文を引用情報の候補文とする 。そして、その候補文の先頭の文に (A)前拡張用手が力り語句のいずれかが少なく とも含まれる場合に、その候補文よりも 1個前の文を候補文に含める。文献情報取得 部 13は、候補文を前に拡張する処理を、候補文の先頭の文に (A)前拡張用手がか り語句のすべてが含まれなくなるまで繰り返す。また、文献情報取得部 13は、候補文 の次の文 (候補文に後側に隣接する文であり、候補文には含まれない文)に、(B)後 拡張用手が力り語句が含まれる場合に、その候補文の次の文を候補文に含める。文 献情報取得部 13は、候補文を後に拡張する処理を、候補文の次の文に (B)後拡張 用手が力り語句のすべてが含まれなくなるまで繰り返す。候補文を拡張する処理が 終了した時点における候補文が、引用情報となる。文献情報取得部 13は、その引用 情報を取得する。なお、(A)前拡張用手が力り語句、(B)後拡張用手がかり語句は 一例であって、インデックスの含まれる文と関連の深い文を適切に抽出することがで きるのであれば、それ以外の手力かり語句を用いてもよい。また、ここでは、インデック スの含まれる文に注目したが、引用文献識別情報によって直接、引用文献に言及し ている技術文献においては、引用文献識別情報の含まれる文を、インデックスの含ま れる文の代わりに用いることになる。 [0165] First, the document information acquisition unit 13 sets a sentence including an index as a candidate sentence for citation information. Then, if the first sentence of the candidate sentence contains (A) at least one of the previous extension hand and the phrase is included, the sentence one sentence before the candidate sentence is included in the candidate sentence. The document information acquisition unit 13 repeats the process of extending the candidate sentence before the first sentence of the candidate sentence does not include (A) the previous extension handwriting and all of the phrases. In addition, the document information acquisition unit 13 adds (B) an extension hand to the next sentence after the candidate sentence (a sentence that is adjacent to the candidate sentence and not included in the candidate sentence). Is included, the next sentence of the candidate sentence is included in the candidate sentence. The document information acquisition unit 13 repeats the process of extending the candidate sentence later until the next sentence after the candidate sentence does not include all of the words (B). The candidate sentence at the time when the process of extending the candidate sentence is completed becomes the citation information. The literature information acquisition unit 13 acquires the citation information. Note that (A) the pre-extension hand phrase is an example, and the (B) post-extension clue phrase is an example, so long as it is possible to appropriately extract sentences closely related to the sentence containing the index. Other clue words may be used. Here, we focused on the sentence that contains the index, but in the technical literature that directly refers to the cited document by the cited document identification information, the sentence that contains the cited document identification information is replaced with the sentence that contains the index. Will be used instead of.
[0166] また、引用形式を判断する方法が、上述した具体的な説明に限定されないことは言 うまでもない。ここでは、引用形式を判断する他の方法について説明する。文献情報 取得部 13は、例えば、引用情報に所定の手が力り句が含まれるかどうかのみによつ て引用形式を判断するのではなぐ引用情報に含まれる手がかり句の位置も考慮し て引用形式を判断してもよい。より具体的には、例えば、タイプ C用手がかり句辞書で は、図 20で示されるように、タイプ Cの手が力り句(図 20では、「手がかり語句」として いる)と、手が力り句の位置とを対応付けられているものとする。図 20の「手がかり句 の位置」とは、引用情報において、インデックスや引用文献識別情報の含まれる文を 基準 (0)とした、文の数を示す情報である。例えば、手が力り句の位置「1〜3」の場 合には、引用情報において、インデックスや引用文献識別情報の含まれる文の次の 文 (手がかり句の位置 = 1)から、インデックスや引用文献識別情報の含まれる文から 3番目の後方の文(手が力り句の位置 = 3)までの 3個の文を意味して 、る。手がかり 句の位置がマイナスである場合には、インデックスや引用文献識別情報の含まれる 文よりも前の文を意味する。文献情報取得部 13は、タイプ Cの手が力り句が、対応す る手が力り句の位置で示される文に含まれる場合に、その引用情報の引用形式を、 タイプ Cであると判断してもよい。より具体的には、文献情報取得部 13は、引用情報 にお 、て、インデックスや引用文献識別情報の含まれる文の次の文( 1番目の文)か ら 3番目の文までのいずれかの文に「However」が含まれる場合に、その引用情報 の引用形式をタイプ Cであると判断してもよい。なお、ここでは、引用形式がタイプ C であるかどうかの判断方法について説明したが、引用形式がタイプ Bであるかどうか の判断方法についても同様である。 [0166] Needless to say, the method of determining the citation format is not limited to the specific description described above. Here, another method for determining the citation format will be described. For example, the document information acquisition unit 13 considers the position of the cue phrase included in the citation information that does not determine the citation format only based on whether or not the predetermined hand is included in the citation information. The citation format may be determined. More specifically, for example, in the type C clue phrase dictionary, as shown in FIG. 20, the type C hand is a power phrase (in FIG. 20, “cue phrase”) and the hand is strong. Assume that the position of the phrase is associated. The “position of the clue phrase” in FIG. 20 is information indicating the number of sentences in the citation information, with the sentence including the index and the cited document identification information as a reference (0). For example, if the hand is in the position “1-3” of the force phrase, the citation information starts with the index or the sentence after the sentence containing the index or cited document identification information (cue phrase position = 1). It means three sentences from the sentence containing the cited document identification information to the third backward sentence (hand is the position of the force phrase = 3). If the position of the clue phrase is negative, it means the sentence before the sentence that contains the index or cited document identification information. The document information acquisition unit 13 determines that the type of citation information is type C when the type C hand is included in the sentence indicated by the position of the corresponding hand. You may judge. More specifically, the document information acquiring unit 13 selects any one of the citation information from the next sentence (first sentence) to the third sentence after the sentence including the index and the cited document identification information. If “However” is included in the sentence, the citation format of the citation information may be determined to be type C. Although the method for determining whether the citation format is type C has been described here, the method for determining whether the citation format is type B is the same.
[0167] このように、本実施の形態による関連用語取得装置 1によれば、引用文献の引用形 式に応じた項目に対応する引用文献情報の一部を取得することができ、引用形式に 応じて、どの項目に対応する引用文献情報の一部を取得するのかを適切に設定す ることにより、より精度の高い関連用語情報の取得が可能となりうる。  Thus, according to the related term acquisition device 1 according to the present embodiment, it is possible to acquire a part of the cited document information corresponding to the item according to the cited format of the cited document. Accordingly, it is possible to obtain related term information with higher accuracy by appropriately setting which part of the cited document information corresponding to which item is to be obtained.
[0168] なお、本実施の形態では、上記の具体例のように、引用形式力 Sタイプ Cである場合 には、引用文献情報のタイトルを示す情報を取得し、引用形式がタイプ Bである場合 には、引用文献情報の要約を示す情報を取得する場合について説明したが、これは その他の設定であってもよい。例えば、引用文献情報取得部 14は、文献情報取得 部 13が取得した文献情報の全部または一部に含まれる弓 I用情報における引用形式 力 S問題指摘型引用形式である場合には、引用文献情報における背景的な項目を有 する部分を取得し、文献情報取得部 13が取得した文献情報の全部または一部に含 まれる引用情報における引用形式が論説根拠型引用形式である場合には、引用文 献情報における特徴的な項目を有する部分を取得してもよい。ここで、背景的な項目 とは、例えば、特許文献であれば、「従来技術」や「背景技術」等であってもよぐ学術 文献であれば、「イントロダクション」や「関連研究」等であってもよい。また、特徴的な 項目とは、例えば、特許文献であれば、「実施の形態」や「効果」等であってもよぐ学 術文献であれば、「終わりに」や「結論」等であってもよい。これらの項目に対応する引 用文献情報の部分を取得する方法としては、実施の形態 1、 2で説明した方法を用い ることがでさる。 [0168] In the present embodiment, as in the above specific example, when the citation format is S type C, information indicating the title of the cited document information is acquired, and the citation format is type B. In this case, the case where information indicating a summary of cited document information is acquired has been described, but this may be another setting. For example, the cited document information acquisition unit 14 may refer to the cited document in the information for the bow I included in all or part of the document information acquired by the document information acquisition unit 13 S. A portion having background items in the information is acquired and included in all or a part of the document information acquired by the document information acquisition unit 13. If the citation format in the citation information is an editorial-based citation format, a portion having characteristic items in the citation information may be acquired. Here, the background items are, for example, patent documents, “conventional technology”, “background technology”, etc., and academic documents that may be “introduction”, “related research”, etc. There may be. The characteristic items are, for example, patent documents, “embodiments”, “effects”, etc. There may be. The method described in the first and second embodiments can be used as a method for obtaining the portion of the reference document information corresponding to these items.
また、引用形式に応じた引用文献情報の部分を取得する処理を、実施の形態 2に おける関連用語取得装置 2にお 、て実行してもよ 、ことは言うまでもな!/、。  Needless to say, the related term acquisition device 2 according to the second embodiment may execute the process of acquiring the cited document information part according to the citation format! /.
[0169] また、上記各実施の形態にお!、て、文献情報記憶部 11で記憶されて!、る文献情 報は 2以上の種類のものであってもよい。例えば、文献情報記憶部 11において、第 1 の種類の文献情報と、第 2の種類の文献情報とが少なくとも記憶されており、文献情 報取得部 13は、文献情報記憶部 11で記憶されて!、る第 1の種類の文献情報から、 第 1の種類の文献情報の全部または一部を取得し、引用文献情報取得部 14は、文 献情報記憶部 11で記憶されている第 2の種類の文献情報から、引用文献情報の一 部を取得してもよい。例えば、第 1の種類の文献情報が学術論文に関する情報であり 、第 2の種類の文献情報が特許書類の情報であってもよぐその逆であってもよい。 一般に学術論文に慣れて 、るユーザは、特許書類で用いられる用語に精通して ヽ ない場合が多いが、このような関連用語取得装置を用いることにより、学術論文で用 いられる用語から、その用語に関連する特許書類で用いられる関連用語を取得する ことができる。また、第 1の種類の文献情報が第 1の種類の言語の情報であり、第 2の 種類の文献情報が第 2の種類の言語の情報であってもよい。第 1の種類の言語、第 2 の種類の言語は、例えば、英語、日本語、中国語、フランス語、ドイツ語等、どのよう な言語であってもよい。このようにすることで、例えば、日本語の用語に関連する英語 の関連用語を取得することが可能となりうる。  [0169] Also, in each of the above embodiments, the document information stored in the document information storage unit 11 may be of two or more types. For example, in the document information storage unit 11, at least a first type of document information and a second type of document information are stored, and the document information acquisition unit 13 is stored in the document information storage unit 11. !, All or part of the first type of document information is acquired from the first type of document information, and the cited document information acquisition unit 14 stores the second type of information stored in the document information storage unit 11. A part of the cited document information may be acquired from the type of document information. For example, the first type of document information may be information on academic papers, and the second type of document information may be patent document information or vice versa. In general, users who are familiar with academic papers are often not familiar with the terms used in patent documents, but by using such a related term acquisition device, the terms used in academic papers Related terms used in patent documents related to terms can be obtained. The first type of document information may be information of the first type of language, and the second type of document information may be information of the second type of language. The first type language and the second type language may be any language such as English, Japanese, Chinese, French, German, and the like. In this way, for example, it may be possible to obtain English related terms related to Japanese terms.
[0170] (実施の形態 4)  [Embodiment 4]
本発明の実施の形態 4による技術動向情報作成装置について、図面を参照しなが ら説明する。 Regarding the technical trend information creation device according to the fourth embodiment of the present invention, refer to the drawings. Will be explained.
図 16は、本実施の形態による技術動向情報作成装置の構成を示すブロック図であ る。図 16において、本実施の形態による技術動向情報作成装置 1001は、検索用語 受付部 1011と、文献情報記憶部 1012と、検索部 1013と、特定部 1014と、文献情 報取得部 1015と、技術用語抽出部 1016と、時期情報取得部 1017と、技術動向情 報作成部 1018と、出力部 1019とを備える。  FIG. 16 is a block diagram showing the configuration of the technology trend information creation device according to this embodiment. In FIG. 16, the technical trend information creation device 1001 according to the present embodiment includes a search term reception unit 1011, a document information storage unit 1012, a search unit 1013, a specification unit 1014, a document information acquisition unit 1015, a technology A term extraction unit 1016, a time information acquisition unit 1017, a technology trend information creation unit 1018, and an output unit 1019 are provided.
[0171] 検索用語受付部 1011は、検索用語を受け付ける。検索用語とは、後述する検索 部 1013での検索において、検索キーワードとして用いられる情報である。検索用語 受付部 1011が受け付ける検索用語は、実際には検索用語を示す情報であるが、単 に検索用語と呼ぶことにする。技術動向情報作成装置 1001では、この検索用語受 付部 1011が受け付けた検索用語によって、技術文献が検索され、その検索された 技術文献を示す情報が取得されて、その情報をもとに技術動向情報が作成されるこ とになる。このように、この検索用語は、技術動向情報を作成する分野等を決めるた めに用いられるものである。この検索用語は、一般にテキストデータである力 その他 の情報であってもよい。また、この検索用語は、例えば、作成される技術動向情報の 分野を示す情報であってもよぐ作成される技術動向情報の期間を示す情報であつ てもよい。検索用語が技術動向情報の分野を示す情報である場合には、その検索用 語は、例えば、その分野の名称でもよぐその分野に対応付けられた記号等であって もよい。分野に対応付けられた記号としては、例えば、 IPC (国際特許分類)や、 us クラス (米国特許分類)等がある。本実施の形態では、検索用語受付部 1011が、技 術動向情報の分野の名称を示す検索用語を受け付ける場合について説明する。 [0171] The search term receiving unit 1011 receives a search term. The search term is information used as a search keyword in a search by the search unit 1013 described later. The search term accepted by the search term receiving unit 1011 is actually information indicating the search term, but is simply referred to as a search term. The technical trend information creation device 1001 searches for technical documents based on the search terms received by the search term receiving unit 1011, acquires information indicating the searched technical documents, and based on the information, acquires the technical trends. Information will be created. As described above, this search term is used to determine a field or the like in which technical trend information is created. This search term may be force or other information that is typically text data. Further, this search term may be information indicating the period of the technology trend information to be created, for example, information indicating the field of the technology trend information to be created. When the search term is information indicating a technical trend information field, the search term may be, for example, a symbol associated with the field or the name of the field. Examples of symbols associated with the fields include IPC (international patent classification) and us class (US patent classification). In the present embodiment, a case will be described in which search term receiving unit 1011 receives a search term indicating the name of the field of technology trend information.
[0172] ここで、検索用語受付部 1011による検索用語の受け付けは、例えば、入力デバィ ス (例えば、キーボードやマウス、タツチパネルなど)から入力された情報の受け付け でもよぐ有線もしくは無線の通信回線を介して送信された情報の受信でもよぐ所定 の記録媒体 (例えば、光ディスクや磁気ディスク、半導体メモリなど)カゝら読み出され た情報の受け付けでもよい。なお、検索用語受付部 1011は、受け付けを行うための デバイス (例えば、モデムやネットワークカードなど)を含んでもよぐあるいは含まなく てもよい。また、検索用語受付部 1011は、ハードウェアによって実現されてもよぐあ るいは所定のデバイスを駆動するドライバ等のソフトウェアによって実現されてもよい [0172] Here, the search term accepting unit 1011 accepts the search term using, for example, a wired or wireless communication line that accepts information input from an input device (eg, keyboard, mouse, touch panel, etc.). It is also possible to accept information read from a predetermined recording medium (for example, an optical disk, a magnetic disk, a semiconductor memory, etc.) by receiving information transmitted through the network. The search term receiving unit 1011 may or may not include a device (for example, a modem or a network card) for receiving. The search term acceptance unit 1011 may be implemented by hardware. Alternatively, it may be realized by software such as a driver that drives a predetermined device.
[0173] 文献情報記憶部 1012では、複数の文献情報と、複数の時期情報とが少なくとも記 憶される。ここで、文献情報とは、技術文献の少なくとも一部を示す情報である。技術 文献とは、例えば、技術論文や、特許'実用新案の公報、公開技報、技術関連の書 籍ゃ記事、技術に関するウェブページ等である。一の文献情報には、対応する一の 技術文献の少なくとも一部の情報 (データ)が含まれる。技術文献の一部とは、技術 文献のうち、後の処理で必要となる範囲が少なくとも含まれる部分である。例えば、技 術文献のタイトルと参考文献とを後の処理で用いる場合には、文献情報は、技術文 献のタイトルと参考文献とを示す情報であってもよい。なお、文献情報は、技術文献 の全部を示す情報であってもよい。また、時期情報とは、文献情報の示す技術文献 に関する時期を示す情報である。時期情報は、例えば、文献情報の作成された時期 を示す情報であってもよぐ文献情報の公開された時期を示す情報であってもよぐ 文献情報に関するその他の時期を示す情報であってもよい。時期情報は、例えば、 文献情報に関する年を示す情報でもよぐ年月を示す情報でもよぐ年月日を示す情 報であってもよぐその他の時期を示す情報であってもよい。文献情報と、時期情報と は、どの技術文献がどの時期のものであるかがわ力るように、直接的に、あるいは、間 接的に対応付けられているものとする。前者の場合には、例えば、互いに対応する文 献情報と時期情報とが同じレコードに含まれてもよい。後者の場合には、例えば、技 術文献を識別する文献 ID等を介して、文献情報と、時期情報とが対応付けられてい てもよい。文献情報、及び時期情報は、テキストファイルであってもよぐあるいは、ヮ 一プロソフト等の独自形式のファイルであってもよい。文献情報と時期情報とは、デー タとして一体に構成されてもよぐ別々のデータであってもよい。 [0173] In the document information storage unit 1012, a plurality of document information and a plurality of time information are stored at least. Here, the literature information is information indicating at least a part of the technical literature. The technical literature includes, for example, technical papers, patent 'utility model publications, published technical bulletins, technical documents and articles, technical web pages, and the like. One document information includes at least part of information (data) of the corresponding one technical document. The part of the technical document is a part of the technical document that includes at least a range necessary for subsequent processing. For example, when the title of a technical document and a reference document are used in later processing, the document information may be information indicating the title of the technical document and the reference document. The document information may be information indicating all technical documents. Moreover, time information is information which shows the time regarding the technical literature which literature information shows. The time information may be, for example, information indicating the time when the document information was created, or information indicating the time when the document information was released. Also good. The time information may be, for example, information indicating the year related to the literature information, information indicating the appropriate date, information indicating the correct date, or information indicating other time. It is assumed that the document information and the time information are directly or indirectly associated with each other so that it can be understood which technical document is at which time. In the former case, for example, document information and time information corresponding to each other may be included in the same record. In the latter case, for example, document information and time information may be associated with each other through a document ID for identifying a technical document. The document information and the time information may be a text file, or may be a file of a unique format such as a professional software. The document information and the time information may be separate data or may be integrated as data.
[0174] 文献情報記憶部 1012において、これ以外の情報が記憶されてもよいことは言うま でもない。例えば、技術文献の著者名、出版社名、媒体名(例えば、雑誌名、書籍名 等)、技術文献の分野を示す情報 (例えば、 IPC等)、技術文献の要約等の書誌情報 が文献情報記憶部 1012で記憶されてもよい。また、文献情報には、例えば、引用文 献識別情報や、引用情報が含まれてもよい。引用文献識別情報とは、文献情報にお いて引用して 、る他の技術文献 (以下、この技術文献を「引用文献」と呼ぶこともある )を識別する情報である。引用情報とは、引用情報の含まれる文献情報において引 用して!/、る他の技術文献にっ 、ての記述を示す情報である。技術文献にっ 、ての 記述とは、例えば、その技術文献の問題点を指摘するための記述や、その技術文献 を根拠として新 、理論を提案したり、システムを構築したりするための記述等であるIt goes without saying that other information may be stored in the document information storage unit 1012. For example, bibliographic information such as author name, publisher name, media name (eg, journal name, book name, etc.), technical literature field (eg, IPC), technical literature summary, etc. It may be stored in the storage unit 1012. Further, the document information may include, for example, cited document identification information and cited information. Cited reference identification information refers to reference information. This is information for identifying other technical documents (hereinafter, this technical document may be referred to as “cited document”). Cited information is information indicating the description of other technical documents cited in the document information including the cited information. In the technical literature, the description is, for example, a description for pointing out the problems of the technical literature, a description for proposing a new theory or constructing a system based on the technical literature. Etc.
。ここで、引用文献識別情報は、引用文献を識別することができる情報であれば、そ の内容を問わない。引用文献識別情報は、例えば、特許番号や、特許出願の出願 公開番号、 PMID (PubMed Identifier)等の引用文献を識別する IDであってもよ ぐ著者名、タイトル、著作年月日、媒体に関する情報 (雑誌名、出版社名、 URL等) 等の引用文献を識別しうる文献の書誌情報であってもよい。本実施の形態では、文 献情報に少なくともタイトルと、引用文献識別情報と、引用情報とが含まれるものとす る。 . Here, the cited document identification information may be any information as long as it is information that can identify the cited document. Cited reference identification information includes, for example, the author's name, title, date of publication, and medium, which may be an ID for identifying a cited reference such as a patent number, an application publication number of a patent application, or a PMID (PubMed Identifier). It may be bibliographic information of a document that can identify a cited document such as information (journal name, publisher name, URL, etc.). In the present embodiment, it is assumed that the document information includes at least a title, cited document identification information, and cited information.
ここで、引用文献識別情報と引用情報とについてさらに詳細に説明する。文献情報 において、例えば、引用情報は引用文献識別情報と同じ領域に存在してもよぐ引用 情報は引用文献識別情報と異なる領域に存在してもよい。前者の場合としては、例 えば、引用文献識別情報が背景技術の領域に記載されており、引用文献識別情報 そのものによって、引用文献が引用されている場合である。特許明細書等において は、このような引用がなされている場合がある。一方、後者の場合としては、例えば、 引用文献識別情報は参考文献の欄にまとめて記載されており、文献の本文において 、その参考文献の欄に記載されて 、る引用文献識別情報に対応する所定のインデッ タスによって引用文献が引用されている場合である。学術文献においては、このよう な引用がなされている場合が多い。ここで、インデックスとは、例えば、 1、 2、 3…や、 1)、 2)、3)…ゝ [1]、 [2]、 [3]…ゝ [A]、 [B]、 [C]…、特許文献 1、特許文献 2、特 許文献 3· ··、非特許文献 1、非特許文献 2、非特許文献 3· ··等であり、引用文献識別 情報を特定できるものであれば、どのようなものであってもよい。引用文献識別情報 そのものによって、引用文献が引用されている場合には、引用情報は、その引用文 献識別情報を含む情報となる。引用文献識別情報を含む情報とは、例えば、引用文 献識別情報を含む所定の領域の情報である。所定の領域とは、例えば、文でもよぐ ノラグラフでもよく、それ以外の何らかの文書のまとまりでもよい。引用文献識別情報 に対応するインデックスによって引用文献が引用されている場合には、引用情報は、 そのインデックスを含む情報となる。インデックスを含む情報とは、例えば、インデック スを含む所定の領域の情報である。所定の領域とは、例えば、文でもよぐノラグラフ でもよぐそれ以外の何らかの文書のまとまりでもよい。この所定の領域は、後述する ように、所定のルールに基づ 、て決定されるものであってもよ 、。 Here, the cited document identification information and the cited information will be described in more detail. In the document information, for example, the citation information may exist in the same area as the cited document identification information, and the citation information may exist in a different area from the cited document identification information. In the former case, for example, the cited document identification information is described in the background art area, and the cited document is cited by the cited document identification information itself. Such citations may be made in patent specifications. On the other hand, in the latter case, for example, the cited document identification information is collectively described in the column of the reference, and corresponds to the cited document identification information described in the column of the reference in the text of the document. This is a case in which a cited document is cited by a predetermined index. In academic literature, such citations are often made. Here, the index is, for example, 1, 2, 3 ... or 1), 2), 3) ... ゝ [1], [2], [3] ... ゝ [A], [B], [C ], Patent document 1, patent document 2, patent document 3, ..., non-patent document 1, non-patent document 2, non-patent document 3, ..., etc., which can identify the cited document identification information. Anything may be used. When a cited document is cited by the cited document identification information itself, the cited information is information including the cited document identification information. The information including the cited document identification information is, for example, information on a predetermined area including the cited document identification information. The predetermined area can be a sentence, for example. It may be a noragraph or some other group of documents. When a cited document is cited by an index corresponding to the cited document identification information, the cited information is information including the index. The information including an index is, for example, information on a predetermined area including an index. The predetermined area may be, for example, a group of documents other than a sentence or a noragraph. The predetermined area may be determined based on a predetermined rule, as will be described later.
[0176] 次に、引用情報における引用文献の引用形式について説明する。引用形式には、 例えば、引用文献の問題を指摘する形式である問題指摘型引用形式 (以下、「タイプ C」の引用形式と呼ぶこともある)と、引用文献を論説の根拠に取り込む形式である論 説根拠型引用形式 (以下、「タイプ B」の引用形式と呼ぶこともある)と、それら以外の 引用形式 (以下、「タイプ 0」の引用形式と呼ぶこともある)とがある。タイプ Bの引用形 式は、既存の研究成果を用いて新しい理論を提案したり、システムを構築したりする 場合に用いられる引用形式である。なお、これら以外の引用形式の区分であってもよ い。ここで、上記説明の引用形式については、次の文献で説明されている。  [0176] Next, the citation format of the cited document in the citation information will be described. The citation format includes, for example, a problem-pointing citation format (hereinafter also referred to as “type C” citation format), which is a format that points out the problem of the cited document, and a format that incorporates the cited document as the basis of the editorial. There are certain citation forms (hereinafter referred to as “Type B” citation forms) and other citation forms (hereinafter also referred to as “Type 0” citation forms). The type B citation format is a citation format used when a new theory is proposed or a system is constructed using existing research results. In addition, it may be classified by other citation formats. Here, the citation form of the above description is described in the following document.
[0177] 文献:難波英嗣、奥村学、「論文間の参照情報を考慮したサーベイ論文作成支援 システムの開発」、自然言語処理、 Vol. 6、 No. 5、 p. 43— 62、 1999年  [0177] Literature: Eizo Namba, Manabu Okumura, “Development of Survey Paper Creation Support System Considering Reference Information Between Papers”, Natural Language Processing, Vol. 6, No. 5, p. 43-62, 1999
[0178] なお、文献情報記憶部 1012は、所定の記録媒体 (例えば、半導体メモリや磁気デ イスク、光ディスクなど)によって実現されうる。文献情報記憶部 1012に文献情報等 が記憶される過程は問わない。例えば、記録媒体を介して文献情報等が文献情報 記憶部 1012で記憶されるようになってもよぐ通信回線等を介して送信された文献 情報等が文献情報記憶部 1012で記憶されるようになってもよぐあるいは、入力デ バイスを介して入力された文献情報等が文献情報記憶部 1012で記憶されるようにな つてもよい。  Note that the document information storage unit 1012 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, or the like). The document information storage unit 1012 may store any document information. For example, document information transmitted via a communication line or the like may be stored in the document information storage unit 1012 even if document information or the like is stored in the document information storage unit 1012 via a recording medium. Alternatively, document information input via an input device may be stored in the document information storage unit 1012.
[0179] 検索部 1013は、検索用語受付部 1011が受け付けた検索用語を用いて、文献情 報記憶部 1012を検索する。検索部 1013は、検索用語受付部 1011が受け付けた 検索用語がメイントピックである技術文献に対応する文献情報を検索することが好適 である。「技術文献に対応する文献情報」とは、技術文献の少なくとも一部を示す文 献情報の意味である。メイントピックとは、ある研究分野の名称である。その範囲の広 狭は注目している範囲によって異なりうる。例えば、「形態素解析」はメイントピックとな りうるが、それを含む「自然言語処理」もメイントピックとなりうる。 The search unit 1013 searches the document information storage unit 1012 using the search term received by the search term receiving unit 1011. It is preferable that the search unit 1013 searches for document information corresponding to the technical document whose search term received by the search term receiving unit 1011 is the main topic. “Document information corresponding to a technical document” means document information indicating at least a part of the technical document. The main topic is the name of a research field. Wide range Narrowness can vary depending on the area of interest. For example, “morphological analysis” can be a main topic, but “natural language processing” including it can also be a main topic.
[0180] そのような検索のために、検索部 1013は、文献情報記憶部 1012で記憶されてい る各文献情報の全体を検索してもよぐ各文献情報の一部 (例えば、タイトルのみ等) を検索してもよぐあるいは、文献情報以外の情報を検索してもよい。文献情報の一 部を検索する場合には、文献情報は、その文献情報の一部を特定することができる 構造を有しているものとする。例えば、文献情報の部分ごとに項目名が記載されてい てもよく、あるいは、マークアップ言語のタグ等を用いて項目が特定できるようにされ ていてもよい。「文献情報以外の情報」とは、例えば、時期情報であってもよぐその 他の書誌情報であってもよ 、。  [0180] For such a search, the search unit 1013 may search a part of each piece of document information stored in the document information storage unit 1012 (for example, only the title etc.). ) Or information other than literature information may be searched. When searching for a part of the document information, the document information has a structure capable of specifying a part of the document information. For example, the item name may be described for each part of the document information, or the item may be specified using a markup language tag or the like. “Information other than bibliographic information” may be, for example, time information or other bibliographic information.
[0181] また、検索用語がメイントピックである技術文献に対応する文献情報を検索するた めに、検索部 1013は、例えば、その検索用語が所定の回数以上記載されている文 献情報を検索してもよぐあるいは、所定の複数の領域 (例えば、タイトルと要約など) にその検索用語が記載されている文献情報を検索してもよい。このように、ある検索 用語がメイントピックである技術文献に対応する文献情報を検索する方法はすでに 次の文献等で知られており、その詳細な説明を省略する。  [0181] In addition, in order to search for document information corresponding to the technical document whose search term is the main topic, the search unit 1013 searches, for example, document information in which the search term is described more than a predetermined number of times. Alternatively, literature information in which the search term is described in a plurality of predetermined areas (for example, a title and a summary) may be searched. As described above, a method for searching for document information corresponding to a technical document having a certain search term as a main topic is already known in the following documents, and the detailed description thereof is omitted.
[0182] 文献: Ricardo Baeza— Yates, Berthier Ribeiro— Neto、「Modem Infor mation Retrieval 、 Addison— Wesley Pub、 ISBN : 020139829X、 1999年 [0182] Literature: Ricardo Baeza—Yates, Berthier Ribeiro—Neto, “Modem Information Retrieval, Addison—Wesley Pub, ISBN: 020139829X, 1999
[0183] 文献:徳永健伸、「情報検索と言語処理」、東京大学出版会、 ISBN :413065405 5、 1999年 [0183] Literature: Takenobu Tokunaga, “Information Retrieval and Language Processing”, University of Tokyo Press, ISBN: 413065405 5, 1999
[0184」 文献: Nitin Indurkhya, Tong Zhang, Fred J. Damerau、「Text Min ing: Predictive Methods For Analyzing Unstructured Inf ormationj、 S pringer、 ISBN : 0387954333、 2004年  [0184] Article: Nitin Indurkhya, Tong Zhang, Fred J. Damerau, “Text Mining: Predictive Methods For Analyzing Unstructured Information j, Springer, ISBN: 0387954333, 2004
[0185] また、文献情報が技術文献のタイトルを少なくとも含む情報である場合に、検索部 1 013は、文献情報記憶部 1012で記憶されている文献情報の示す技術文献のタイト ルにおけるメイントピックを示す領域に、検索用語受付部 1011が受け付けた検索用 語が含まれている文献情報を検索してもよい。このようにすることで、検索用語がメイ ントピックである技術文献に対応する文献情報を検索しうると考えられる。タイトルに おいてメイントピックを示す領域は、例えば、日本語のタイトルにおいては、タイトルの 最後の領域であってもよぐ英語のタイトルにおいては、タイトルの最初の領域であつ てもよい。 [0185] Also, when the document information is information including at least the title of the technical document, the search unit 1013 finds the main topic in the title of the technical document indicated by the document information stored in the document information storage unit 1012. The document information including the search term received by the search term receiving unit 1011 may be searched for in the indicated area. By doing so, it is considered that the literature information corresponding to the technical literature whose search term is the main topic can be searched. In the title The area indicating the main topic may be, for example, the last area of the title in a Japanese title, or the first area of the title in an English title.
[0186] なお、検索部 1013が行う検索処理そのものについては、すでに公知であるため、 その詳細な説明を省略する。この検索部 1013による検索の結果、一般に 1以上の文 献情報が検索結果として特定されることになる。その特定は、例えば、その文献情報 を識別する情報を一時記憶メモリ等の記録媒体に蓄積することによって行われてもよ ぐその文献情報に対応付けてフラグを設定することによって行われてもよい。  [0186] Note that the search processing itself performed by the search unit 1013 is already known, and thus detailed description thereof is omitted. As a result of the search by the search unit 1013, one or more document information is generally specified as the search result. The identification may be performed, for example, by accumulating information for identifying the document information in a recording medium such as a temporary storage memory, or by setting a flag in association with the document information. .
[0187] 特定部 1014は、検索部 1013によって検索された文献情報と分野が同一の文献 情報を特定する。この特定部 1014による特定を行うことで、検索部 1013が検索しな 力つた文献情報であるが、検索部 1013が検索した文献情報と分野が同一である文 献情報も、技術動向情報の作成に用いることができるようになる。ここで、その特定方 法としては、例えば、(1)引用関係を用いる方法、(2)文書分類や文書クラスタリング を用いる方法等がある。  [0187] The identifying unit 1014 identifies document information that has the same field as the document information retrieved by the retrieval unit 1013. By specifying by the specifying unit 1014, the reference information that the search unit 1013 has not searched for is used. However, the document information that has the same field as the reference information searched by the search unit 1013 is also created. Can be used. Here, as the identification method, there are, for example, (1) a method using citation relations, and (2) a method using document classification and document clustering.
[0188] (1)引用関係を用いる方法  [0188] (1) Method using citation relationship
引用関係を用いる方法では、検索部 1013が検索した文献情報において引用され ている技術文献に対応する文献情報は、検索部 1013が検索した文献情報と分野が 同一であると仮定して文献情報の特定を行う方法である。その文献情報の特定にお いて、例えば、(1 1)引用文献識別情報のみを用いる方法と、(1 2)引用文献識 別情報と弓 I用形式とを用 ヽる方法とがある。  In the method using the citation relationship, the document information corresponding to the technical document cited in the document information searched by the search unit 1013 is assumed to have the same field as the document information searched by the search unit 1013. It is a method of specifying. In specifying the document information, there are, for example, (11) a method using only cited document identification information and (12) a method using cited document identification information and a format for bow I.
[0189] (1— 1)引用文献識別情報のみを用いる方法  [0189] (1-1) Method using only cited document identification information
特定部 1014は、検索部 1013によって検索された文献情報から、その文献情報に おいて引用して 、る他の技術文献を識別する情報である弓 I用文献識別情報を取得 する。そして、その取得した引用文献識別情報で文献情報を特定する。  The identifying unit 1014 acquires the document identification information for bow I, which is information for identifying other technical documents cited in the document information, from the document information searched by the search unit 1013. Then, the document information is specified by the acquired cited document identification information.
[0190] 引用文献識別情報の取得において、特定部 1014は、引用文献識別情報を含むと 考えられる参考文献 (Reference)の欄を取得してもよぐ特許文献、非特許文献の 欄を取得してもよぐあるいは、パターンマッチング等の技術を用いて、引用文献識 別情報の存在する領域を取得してもよい。参考文献の欄や、特許文献等を取得する 場合には、例えば、取得する部分に対応する項目を示す情報が図示しない記録媒 体において記憶されており、特定部 1014は、その記録媒体力 項目を示す情報を 読み出して、その項目に対応する文献情報の領域を取得してもよい。また、パターン マッチング等の技術を用いて引用文献識別情報を取得する場合には、例えば、バタ ーンマッチング等で用いるパターン等を示す情報が図示しな 、記録媒体にぉ 、て記 憶されており、特定部 1014は、その記録媒体からパターン等を読み出してパターン マッチング等を行うことによって、文献情報における引用文献識別情報を含む領域を 取得してもよい。特定部 1014が取得した引用文献識別情報は、図示しない記録媒 体に一時的に蓄積されてもよい。 [0190] In acquiring the cited document identification information, the specifying unit 1014 acquires the columns of the patent document and the non-patent document that may acquire the column of the reference that is considered to include the cited document identification information. Alternatively, the area where the cited document identification information exists may be acquired by using a technique such as pattern matching. Obtain references and patent documents In this case, for example, information indicating an item corresponding to the part to be acquired is stored in a recording medium (not illustrated), and the specifying unit 1014 reads information indicating the recording medium force item and corresponds to the item. An area of literature information may be acquired. In addition, when acquiring cited document identification information using a technique such as pattern matching, for example, information indicating a pattern used in pattern matching or the like is stored in a recording medium without being illustrated, The specifying unit 1014 may acquire an area including the cited document identification information in the document information by reading a pattern or the like from the recording medium and performing pattern matching or the like. The cited document identification information acquired by the specifying unit 1014 may be temporarily stored in a recording medium (not shown).
[0191] (1— 2)引用文献識別情報と引用形式とを用いる方法  [0191] (1-2) Method of using cited document identification information and citation format
特定部 1014は、検索部 1013によって検索された文献情報から、引用文献識別情 報を取得すると共に、その文献情報において引用している他の技術文献についての 記述を示す情報である引用情報も取得する。そして、その取得した引用情報の示す 引用形式が、引用して 、る技術文献の問題を指摘する形式である問題指摘型 I用 形式 (タイプ Cの弓 I用形式)である場合に、その引用情報で記述されて!、る技術文献 を識別する弓 I用文献識別情報で文献情報を特定する。  The identifying unit 1014 acquires the cited document identification information from the document information searched by the search unit 1013, and also acquires citation information that is information indicating a description of another technical document cited in the document information. To do. Then, if the citation format indicated by the obtained citation information is the problem indication type I format (type C bow I format), which is a format that points out the problem of the technical document to be cited, The document information is specified by the document identification information for Bow I that identifies the technical document described in the information.
[0192] 例えば、ある技術文献 Aにおいて技術文献 Bが引用されている場合に、技術文献 B の問題を指摘する形式 (タイプ Cの I用形式)で技術文献 Bが引用されている場合に は、技術文献 Aと技術文献 Bとの分野が同じである可能性の高いことが発明者らの研 究によってわ力 ている (前述の文献「論文間の参照情報を考慮したサーベイ論文 作成支援システムの開発」参照)。  [0192] For example, when technical document B is cited in a certain technical document A, and technical document B is cited in a format that points out the problem of technical document B (format for type C I) Inventor's research has shown that technical literature A and technical literature B are likely to be in the same field (the above-mentioned literature, “Survey Paper Creation Support System Considering Reference Information Between Papers”). Development ").
[0193] したがって、引用情報によって示される技術文献の引用形式力タイプ Cの引用形式 である場合に、その技術文献を識別する引用文献識別情報で文献情報を特定し、そ うでない場合に、文献情報を特定しなくてもよい。  [0193] Accordingly, if the citation format of the technical document indicated by the citation information is a citation format of type C, the literature information is specified by the cited document identification information that identifies the technical document, and if not, It is not necessary to specify information.
[0194] 特定部 1014は、例えば、タイプ Cの引用形式に応じた 1以上の手力かり語句を図 示しない記録媒体において保持しておき、引用情報の所定の位置にその手が力り語 句が含まれるかどうかによつて、引用形式がタイプ Cであるかどうかを決定してもよい。 所定の位置とは、例えば、引用情報の先頭であってもよぐ引用情報の最後であって もよぐ引用情報のすべてであってもよぐあるいは、引用情報において引用を行って いる文 (例えば、引用文献識別情報を含む文や、引用文献識別情報に対応するイン デッタスを含む文など)を基準として、所定の範囲であってもよい。その所定の位置は 、手力かり語句ごとに設定されていてもよい。なお、引用情報力も引用形式を決定で きるのであれば、これ以外の方法を用いてもよい。 [0194] The identification unit 1014 holds, for example, one or more clue words / phrases corresponding to the type C citation format in a recording medium (not shown), and the hand moves to the predetermined position of the citation information. Whether the citation form is type C may be determined by whether the phrase is included. The predetermined position is, for example, the end of the citation information which may be the beginning of the citation information. Sentences that are all citation information or sentences that are cited in the citation information (for example, sentences that include cited document identification information or sentences that include an index corresponding to the cited document identification information) May be within a predetermined range. The predetermined position may be set for each clue word. As long as the citation information ability can determine the citation format, other methods may be used.
[0195] 引用情報の取得において、特定部 1014は、例えば、引用文献識別情報について 言及している領域を、引用文献識別情報等を用いて特定し、その特定した領域を含 む引用情報を取得してもよぐあるいは、引用文献識別情報に対応するインデックス につ 、て言及して 、る領域を、弓 I用文献識別情報に対応するインデックスを用 、て 特定し、その特定した領域を含む引用情報を取得してもよい。特定部 1014が取得し た引用情報は、図示しない記録媒体に一時的に蓄積されてもよい。  [0195] In the acquisition of citation information, for example, the specifying unit 1014 specifies an area referring to the cited document identification information using the cited document identification information, and acquires the citation information including the specified area. Alternatively, the index corresponding to the cited document identification information is referred to, and the area to be identified is identified using the index corresponding to the document identification information for Bow I, and the identified area is included. Citation information may be acquired. The citation information acquired by the specifying unit 1014 may be temporarily stored in a recording medium (not shown).
[0196] 特定部 1014は、引用情報を取得し、その引用情報における引用形式がタイプじで あると判断した場合に、その引用情報で記述されている技術文献を識別する引用文 献識別情報で文献情報を特定することになる。例えば、引用情報において、引用文 献識別情報によって引用文献についての言及が行われている場合には、その引用 文献識別情報で文献情報を特定すればよい。一方、引用情報において、インデック スによって引用文献についての言及が行われている場合には、そのインデックスに対 応する弓 I用文献識別情報を特定し、その特定した弓 I用文献識別情報で文献情報を 特定すればよい。  [0196] The identifying unit 1014 obtains the citation information, and when it is determined that the citation format in the citation information is type, the citation information is used to identify the technical document described in the citation information. Document information will be specified. For example, in the cited information, when the cited document is referred to by the cited document identification information, the document information may be specified by the cited document identification information. On the other hand, in the cited information, when the cited document is referred to by the index, the document identification information for the bow I corresponding to the index is identified, and the document is identified by the identified document identification information for the bow I. Identify the information.
[0197] なお、上記(1 1)、 (1 2)における引用文献識別情報や引用情報の取得におい て、特定部 1014は、文献情報記憶部 1012から引用文献識別情報等を取得してもよ ぐあるいは、後述する文献情報取得部 1015が引用文献識別情報等を含む文献情 報を取得している場合には、その文献情報取得部 1015が取得した文献情報から引 用情報を取得してもよい。本実施の形態では、前者の場合について説明する。  [0197] In obtaining the cited document identification information and the cited information in the above (11) and (12), the specifying unit 1014 may obtain the cited document identification information from the document information storage unit 1012. Or, when the document information acquisition unit 1015 described later acquires document information including cited document identification information, etc., even if the reference information is acquired from the document information acquired by the document information acquisition unit 1015 Good. In the present embodiment, the former case will be described.
[0198] (2)文書分類や文書クラスタリングを用いる方法  [0198] (2) Method using document classification and document clustering
複数の文献情報を分類したりクラスタリングしたりする方法は、従来から知られて ヽ る。例えば、上述の文献「Modern Information Retrieval」、「情報検索と言語処 理」、「Text Mining: Predictive Methods For Analyzing Unstructured Information」等において開示されている。この方法では、同一のグループに分類さ れたり、クラスタリングされたりした文献情報は、分野が同一の文献情報であると仮定 して、文献情報を特定する。より具体的には、検索部 1013によって検索された文献 情報と同一のグループに分類されたり、クラスタリングされたりした文献情報を、特定 部 1014は、検索部 1013によって検索された文献情報と分野が同一の文献情報を 特定する。この場合における文献情報の特定は、例えば、文献情報を識別する文献 IDや、文献情報の書誌情報 (タイトル、発表年、筆頭著者名、文献の掲載された媒 体名等)によってなされてもよい。 Methods for classifying and clustering a plurality of document information have been conventionally known. For example, the above-mentioned documents “Modern Information Retrieval”, “Information Retrieval and Language Processing”, “Text Mining: Predictive Methods For Analyzing Unstructured” Information "etc. In this method, literature information that is classified into the same group or clustered is assumed to be literature information in the same field, and the literature information is specified. More specifically, the document information classified into the same group as the document information searched by the search unit 1013 or clustered is specified, and the specifying unit 1014 has the same field as the document information searched by the search unit 1013. Identify bibliographic information. In this case, the document information may be specified by, for example, a document ID for identifying the document information or bibliographic information of the document information (title, publication year, first author name, name of the medium on which the document is published, etc.). .
[0199] 文献情報取得部 1015は、文献情報取得部 1015で記憶されている複数の文献情 報のうち、少なくとも一部の文献情報の全部または一部を取得する。文献情報取得 部 1015は、検索部 1013によって検索された文献情報の全部または一部を取得す る。文献情報取得部 1015が文献情報の一部を取得する場合に、その文献情報の一 部は、例えば、文献情報のタイトルや要約等であってもよい。例えば、その取得する 部分に対応する項目を示す情報が図示しない記録媒体において記憶されており、文 献情報取得部 1015は、その記録媒体力も項目を示す情報を読み出して、その項目 に対応する文献情報の領域を、文献情報の一部として取得してもよい。文献情報取 得部 1015が文献情報の一部を取得し、検索部 1013が文献情報の一部を検索する 場合に、その取得する文献情報の一部と、検索で用いる文献情報の一部とは、文献 情報の同じ部分であってもよぐあるいは、文献情報の異なる部分であってもよい。  [0199] The document information acquisition unit 1015 acquires all or a part of at least a part of the document information among the plurality of document information stored in the document information acquisition unit 1015. The document information acquisition unit 1015 acquires all or part of the document information searched by the search unit 1013. When the document information acquisition unit 1015 acquires a part of the document information, a part of the document information may be, for example, a title or a summary of the document information. For example, information indicating an item corresponding to the part to be acquired is stored in a recording medium (not shown), and the document information acquiring unit 1015 reads information indicating the item of the recording medium power and a document corresponding to the item. The area of information may be acquired as part of the literature information. When the document information acquisition unit 1015 acquires part of the document information and the search unit 1013 searches for part of the document information, a part of the document information to be acquired and a part of the document information used in the search May be the same part of the document information or different parts of the document information.
[0200] 文献情報取得部 1015は、特定部 1014が特定した文献情報の全部または一部も 文献情報記憶部 1012から取得する。例えば、引用文献識別情報によって文献情報 が特定されている場合には、文献情報取得部 1015は、引用文献識別情報を用いて 文献情報記憶部 1012を検索して、その検索された文献情報の全部または一部を取 得する。ここで、文献情報取得部 1015が特定部 1014による特定結果を用いて取得 する文献情報の領域は、文献情報取得部 1015が検索部 1013による検索結果を用 いて取得する文献情報の領域と同様であるとする。例えば、文献情報取得部 1015 が検索結果を用いて文献情報のタイトルを取得する場合には、文献情報取得部 101 5は、特定された文献情報のタイトルを取得するものとする。 文献情報取得部 1015は、検索結果や特定結果に応じて取得した文献情報の全 部または一部を、図示しな!、記録媒体にお!、て一時的に記憶してもよ!/、。 [0200] The document information acquisition unit 1015 also acquires all or part of the document information specified by the specification unit 1014 from the document information storage unit 1012. For example, when the document information is specified by the cited document identification information, the document information acquisition unit 1015 searches the document information storage unit 1012 using the cited document identification information, and all of the retrieved document information. Or get some. Here, the document information area acquired by the document information acquiring unit 1015 using the specified result by the specifying unit 1014 is the same as the document information area acquired by the document information acquiring unit 1015 using the search result by the searching unit 1013. Suppose there is. For example, when the document information acquisition unit 1015 acquires the title of the document information using the search result, the document information acquisition unit 1015 acquires the title of the specified document information. The document information acquisition unit 1015 may temporarily store all or a part of the document information acquired according to the search result or the specific result on the recording medium! .
[0201] 技術用語抽出部 1016は、文献情報取得部 1015が取得した文献情報の全部また は一部力 技術用語を抽出する。文献情報取得部 1015が取得した文献情報とは、 文献情報取得部 1015が、その一部または全部を取得した文献情報のことである。こ こで、技術用語とは、検索部 1013が検索した技術文献の分野において用いられる 用語である。この技術用語は、検索部 1013が検索した技術文献の分野における要 素となる技術用語であることが好ましい。この技術用語の変遷を見ることにより、技術 動向を知ることができうる。技術用語抽出部 1016が抽出する技術用語は、実際には 技術用語を示す情報であるが、単に技術用語と呼ぶことにする。技術用語抽出部 10 16が技術用語を抽出する方法については後述する。技術用語抽出部 1016は、抽 出した技術用語を図示しな 、記録媒体にぉ 、て一時的に記憶してもよ 、。  [0201] The technical term extraction unit 1016 extracts all or part of the technical information of the literature information acquired by the literature information acquisition unit 1015. The document information acquired by the document information acquisition unit 1015 is the document information acquired by the document information acquisition unit 1015 partially or entirely. Here, the technical term is a term used in the technical literature field searched by the search unit 1013. This technical term is preferably a technical term that is an element in the technical literature field searched by the search unit 1013. By looking at the transition of this technical term, it is possible to know the technical trend. The technical terms extracted by the technical term extraction unit 1016 are actually information indicating technical terms, but are simply referred to as technical terms. A method by which the technical term extraction unit 10 16 extracts technical terms will be described later. The technical term extraction unit 1016 may temporarily store the extracted technical terms on a recording medium without illustration.
[0202] 時期情報取得部 1017は、技術用語抽出部 1016が抽出した技術用語が含まれる 文献情報に対応する時期情報を文献情報取得部 1015から取得する。例えば、文献 情報取得部 1015が取得した文献情報に対応する時期情報を時期情報取得部 101 7が取得することにより、時期情報取得部 1017は、前述の時期情報を取得すること ができる。この場合に、ある文献情報力も技術用語が抽出されな力つた場合には、そ の文献情報に対応する時期情報を取得しなくてもよぐあるいは、その文献情報に対 応する時期情報を後述する技術動向情報の作成で用いな 、ようにしてもょ 、。  [0202] The timing information acquisition unit 1017 acquires the timing information corresponding to the literature information including the technical terms extracted by the technical term extraction unit 1016 from the literature information acquisition unit 1015. For example, when the time information acquisition unit 1017 acquires time information corresponding to the document information acquired by the document information acquisition unit 1015, the time information acquisition unit 1017 can acquire the above-described time information. In this case, if the technical power of a document is not extracted, it is not necessary to acquire the time information corresponding to the document information, or the time information corresponding to the document information is described later. Do not use it to create technical trend information.
[0203] 技術動向情報作成部 1018は、技術用語抽出部 1016が抽出した技術用語と、時 期情報取得部 1017が取得した時期情報とを用いて技術動向情報を作成する。ここ で、技術動向情報とは、技術用語と、その技術用語を含む技術文献に関する時期と を対応付ける情報である。すなわち、技術動向情報によって、ある技術用語がどの時 期に使用されていたのかを知ることができうる。技術動向情報において、技術用語と 対応付けられる時期は、時期情報そのもの(例えば、年月等の数値そのもの)であつ てもよく、あるいは、時期情報とは異なるもの(例えば、年月等に対応した図形等)で あってもよい。後者の場合には、例えば、時期を示す座標軸を用いて、技術情報と対 応付けられる時期が示されてもよい。この技術動向情報は、技術用語抽出部 1016 が抽出した技術用語と、時期情報取得部 1017が取得した時期情報とが対応付けら れた情報であれば、そのデータ形式を問わない。例えば、テキストファイルであっても よぐスプレッドシート形式のファイルであってもよぐ画像ファイルであってもよぐそ の他の形式のファイルであってもよい。この技術動向情報は、例えば、技術用語ごと に、その技術用語を含む文献情報に対応する時期情報の示す時期を示す画像であ つてもよい。技術動向情報には、検索用語受付部 1011が受け付けた検索用語が含 まれてもよい。 [0203] The technical trend information creation unit 1018 creates technical trend information using the technical terms extracted by the technical term extraction unit 1016 and the time information acquired by the timing information acquisition unit 1017. Here, the technology trend information is information that associates a technical term with a period related to technical literature including the technical term. In other words, it is possible to know when a certain technical term was used by the technology trend information. In the technical trend information, the time associated with the technical term may be the time information itself (for example, the numerical value itself such as year or month) or different from the time information (for example, corresponding to the year, month, etc.). Figure etc.). In the latter case, for example, the time associated with the technical information may be indicated using a coordinate axis indicating the time. This technical trend information If the technical term extracted by is associated with the time information acquired by the time information acquisition unit 1017, the data format is not limited. For example, the file may be a text file, a spreadsheet file, an image file, or another file. This technical trend information may be, for example, an image indicating the time indicated by the time information corresponding to the document information including the technical term for each technical term. The technical trend information may include the search terms accepted by the search term acceptance unit 1011.
[0204] 出力部 1019は、技術動向情報作成部 1018が作成した技術動向情報を出力する 。ここで、この出力は、例えば、表示デバイス (例えば、 CRTや液晶ディスプレイなど) への表示でもよぐ所定の機器への通信回線を介した送信でもよぐプリンタによる印 刷でもよぐ記録媒体への蓄積でもよい。なお、出力部 1019は、出力を行うデバイス (例えば、表示デバイスやプリンタなど)を含んでもよぐあるいは含まなくてもよい。ま た、出力部 1019は、ハードウェアによって実現されてもよぐあるいは、それらのデバ イスを駆動するドライバ等のソフトウェアによって実現されてもよい。例えば、技術動向 情報が画像である場合には、出力部 1019は、その技術動向情報を表示出力する。  [0204] The output unit 1019 outputs the technology trend information created by the technology trend information creation unit 1018. Here, this output is output to a recording medium that can be displayed on a display device (for example, a CRT or a liquid crystal display) or transmitted via a communication line to a predetermined device or printed by a printer. May be accumulated. The output unit 1019 may or may not include an output device (for example, a display device or a printer). Further, the output unit 1019 may be realized by hardware, or may be realized by software such as a driver that drives these devices. For example, when the technology trend information is an image, the output unit 1019 displays and outputs the technology trend information.
[0205] なお、この技術動向情報作成装置 1001において用いられる言語は問わない。例 えば、 日本語であってもよぐ英語であってもよぐフランス語であってもよぐ中国語 であってもよぐ韓国語であってもよい。  Note that the language used in the technical trend information creation device 1001 is not limited. For example, it may be in Japanese, English, French, Chinese, Chinese or Korean.
[0206] 次に、本実施の形態による技術動向情報作成装置 1001の動作について、図 17の フローチャートを用いて説明する。  Next, the operation of the technical trend information creation device 1001 according to this embodiment will be described using the flowchart of FIG.
(ステップ S1101)検索用語受付部 1011は、検索用語を受け付けた力どうか判断 する。そして、検索用語を受け付けた場合には、ステップ S 1102に進み、そうでない 場合には、検索用語を受け付けるまでステップ S1101の処理を繰り返す。  (Step S1101) The search term receiving unit 1011 determines whether or not the search term is received. If the search term is accepted, the process proceeds to step S1102, and if not, the process of step S1101 is repeated until the search term is accepted.
[0207] (ステップ S1102)検索部 1013は、検索用語受付部 1011が受け付けた検索用語 で、文献情報記憶部 1012を検索する。検索部 1013は、例えば、検索された文献情 報を識別する文献 IDを取得してもよ ヽ。  (Step S1102) The search unit 1013 searches the document information storage unit 1012 with the search terms received by the search term receiving unit 1011. For example, the search unit 1013 may acquire a document ID for identifying the searched document information.
[0208] (ステップ S1103)特定部 1014は、検索部 1013が検索した文献情報と分野が同 一の文献情報を特定する。なお、この処理の詳細については後述する。 [0209] (ステップ S1104)文献情報取得部 1015は、検索部 1013が検索した文献情報の 全部または一部と、特定部 1014が特定した文献情報の全部または一部とを文献情 報記憶部 1012から取得する。なお、この処理の詳細については後述する。 (Step S1103) The identifying unit 1014 identifies the document information that has the same field as the document information searched by the search unit 1013. Details of this process will be described later. (Step S 1104) The document information acquisition unit 1015 displays all or part of the document information searched by the search unit 1013 and all or part of the document information specified by the specifying unit 1014. Get from. Details of this process will be described later.
[0210] (ステップ S1105)技術用語抽出部 1016は、文献情報取得部 1015が取得した文 献情報の全部または一部から、技術用語を抽出する。この処理の詳細については後 述する。  [0210] (Step S1105) The technical term extraction unit 1016 extracts technical terms from all or part of the document information acquired by the document information acquisition unit 1015. Details of this process will be described later.
[0211] (ステップ S1106)時期情報取得部 1017は、技術用語抽出部 1016が抽出した技 術用語が含まれる文献情報に対応する時期情報を文献情報記憶部 1012から取得 する。  [0211] (Step S1106) The time information acquisition unit 1017 acquires the time information corresponding to the document information including the technical term extracted by the technical term extraction unit 1016 from the document information storage unit 1012.
[0212] (ステップ S1107)技術動向情報作成部 1018は、技術用語抽出部 1016が抽出し た技術用語と、時期情報取得部 1017が取得した時期情報とを用いて、技術動向情 報を作成する。  [0212] (Step S1107) The technical trend information creation unit 1018 creates technical trend information using the technical terms extracted by the technical term extraction unit 1016 and the timing information acquired by the timing information acquisition unit 1017. .
[0213] (ステップ S1108)出力部 1019は、技術動向情報作成部 1018が作成した技術動 向情報を出力する。そして、ステップ S1101に戻る。なお、技術動向情報の出力が、 表示出力のように継続して行うことができる場合には、一定の時間の経過後に、その 出力を終了してもよぐあるいは、終了する指示や割り込み処理等の受け付け等をトリ ガーとして、その出力を終了してもよい。  (Step S 1108) The output unit 1019 outputs the technical trend information created by the technical trend information creation unit 1018. Then, the process returns to step S1101. If the technology trend information can be output continuously as in the case of display output, the output may be terminated after a certain period of time, or an instruction to terminate, interrupt processing, etc. The output may be terminated with the receipt of the message as a trigger.
なお、図 17のフローチャートにおいて、電源オフや処理終了の割り込みにより処理 は終了する。  In the flowchart of FIG. 17, the process ends when the power is turned off or the process is terminated.
[0214] また、図 17のフローチャートにおける検索処理 (ステップ S1102)や、文献情報の 取得処理 (ステップ S1104)、技術用語の抽出処理 (ステップ S1105)、時期情報の 取得処理 (ステップ S1106)等において、何も検索されな力つたり、文献情報等が取 得されなかったりした場合などには、エラーであるとしてステップ S1101に戻るように してちよい。  [0214] In the search process (step S1102), the literature information acquisition process (step S1104), the technical term extraction process (step S1105), the time information acquisition process (step S1106), etc. in the flowchart of FIG. If nothing is searched for or no document information is obtained, it is possible to return to step S1101 as an error.
[0215] また、文献情報の特定処理 (ステップ S1103)にお 、て文献情報が特定されなかつ た場合でも、ステップ S1104に進んでもよぐステップ S1101に戻ってもよい。本実 施の形態では、前者の場合について説明する。  [0215] Further, in the document information specifying process (step S1103), even if the document information is not specified, the process may return to step S1101 regardless of whether the process proceeds to step S1104. In the present embodiment, the former case will be described.
[0216] また、図 17のフローチャートの処理の順序は、ある程度、任意である。例えば、時期 情報を取得する処理は、文献情報の特定処理の後であれば、文献情報の取得処理 や技術用語の抽出処理よりも以前に行われてもよい。 [0216] Further, the order of processing in the flowchart of Fig. 17 is arbitrary to some extent. For example, time The process of acquiring information may be performed before the process of acquiring the document information or the process of extracting the technical terms as long as it is after the process of specifying the document information.
[0217] 次に、図 17のフローチャートにおける文献情報の特定処理 (ステップ S1103)の詳 細な処理について、図 18のフローチャートを用いて説明する。なお、文献情報の特 定の方法には前述のように各種の方法が存在する力 ここでは、引用文献識別情報 と引用形式とを用いて文献情報を特定する方法について説明する。  Next, detailed processing of the document information identification processing (step S1103) in the flowchart of FIG. 17 will be described using the flowchart of FIG. As described above, there are various methods for identifying document information. Here, a method for identifying document information using cited document identification information and a cited format will be described.
(ステップ S 1201 )特定部 1014は、カウンタ iを 1に設定する。  (Step S 1201) The identifying unit 1014 sets the counter i to 1.
[0218] (ステップ S1202)特定部 1014は、検索部 1013が検索した i番目の文献情報から 、すべての引用文献識別情報と、各引用文献識別情報に対応する引用情報とを取 得する。なお、前述のように、引用文献識別情報は、引用情報に含まれていてもよく 、そうでなくてもよい。また、このステップにおいて、検索部 1013が検索した i番目の 文献情報に引用文献識別情報がまったく含まれな力つた場合には、引用文献識別 情報等の取得を行わな 、でステップ S 1203に進む。  (Step S1202) The identifying unit 1014 acquires all the cited document identification information and the cited information corresponding to each cited document identification information from the i-th document information searched by the search unit 1013. As described above, the cited document identification information may or may not be included in the cited information. Also, in this step, if the i th document information searched by the search unit 1013 does not contain any cited document identification information, the process proceeds to step S 1203 without obtaining the cited document identification information. .
[0219] (ステップ S1203)特定部 1014は、カウンタ jを 1に設定する。  [0219] (Step S1203) The identifying unit 1014 sets the counter j to 1.
(ステップ S1204)特定部 1014は、検索された i番目の文献情報から取得された j番 目の引用文献識別情報が存在するかどうか判断する。そして、存在する場合には、ス テツプ S 1205【こ進み、そうでな ヽ場合【こ ίま、ステップ S 1208【こ進む。  (Step S1204) The identifying unit 1014 determines whether or not the j-th cited document identification information acquired from the searched i-th document information exists. If it exists, go to step S 1205; if not, go to step S 1208.
[0220] (ステップ S 1205)特定部 1014は、 j番目の引用文献識別情報に対応する引用情 報の引用形式力 Sタイプ Cであるかどうか判断する。そして、タイプ Cである場合には、 ステップ S1206【こ進み、そうでな ヽ場合【こ ίま、ステップ S1207【こ進む。  [0220] (Step S1205) The identifying unit 1014 determines whether or not the citation form power S type C of the citation information corresponding to the j-th citation identification information. If it is type C, go to step S1206, and if not, go to step S1207.
[0221] (ステップ S1206)特定部 1014は、 j番目の引用文献識別情報を図示しない記録 媒体において一時記憶する。この一時記憶された引用文献識別情報で識別される 文献情報が、特定された文献情報となる。  (Step S1206) The identifying unit 1014 temporarily stores the j-th cited document identification information in a recording medium (not shown). The document information identified by the temporarily stored cited document identification information becomes the identified document information.
[0222] (ステップ S1207)特定部 1014は、カウンタ jを 1だけインクリメントする。そして、ステ ップ S 1204に戻る。  (Step S1207) The identifying unit 1014 increments the counter j by 1. Then, return to step S1204.
(ステップ S1208)特定部 1014は、カウンタ iを 1だけインクリメントする。  (Step S1208) The identifying unit 1014 increments the counter i by 1.
[0223] (ステップ S1209)特定部 1014は、検索部 1013によって検索された i番目の文献 情報が存在するかどうか判断する。そして、存在する場合には、ステップ S1202に戻 り、そうでない場合には、文献情報を特定する一連の処理は終了となり、ステップ S1 104に進む。 (Step S1209) The identifying unit 1014 determines whether or not the i-th document information searched by the search unit 1013 exists. If it exists, return to step S1202. If not, the series of processes for specifying the document information ends, and the process proceeds to step S1104.
[0224] 次に、図 17のフローチャートにおける文献情報の取得処理 (ステップ S1104)の詳 細な処理につ!、て、図 19のフローチャートを用いて説明する。  Next, detailed processing of the document information acquisition processing (step S1104) in the flowchart of FIG. 17 will be described with reference to the flowchart of FIG.
(ステップ S 1301)文献情報取得部 1015は、検索部 1013による検索結果の示す 各文献情報の全部または一部を取得する。この処理の詳細については後述する。  (Step S 1301) The document information acquisition unit 1015 acquires all or part of each document information indicated by the search result by the search unit 1013. Details of this processing will be described later.
[0225] (ステップ S1302)文献情報取得部 1015は、特定部 1014によって文献情報が特 定されたカゝどうか判断する。そして、文献情報が特定された場合には、ステップ S130 3に進み、そうでない場合には、文献情報を取得する一連の処理は終了となり、ステ ップ S 1105に進む。  (Step S1302) The document information acquiring unit 1015 determines whether the document information is identified by the specifying unit 1014. If the document information is specified, the process proceeds to step S1303. If not, the series of processes for acquiring the document information ends, and the process proceeds to step S1105.
[0226] (ステップ S1303)文献情報取得部 1015は、特定部 1014が特定した文献情報の 全部または一部を取得する。なお、この処理の詳細については後述する。この処理 が終了すると、文献情報を取得する一連の処理は終了となり、ステップ S1105に進 む。  (Step S 1303) The document information acquisition unit 1015 acquires all or part of the document information specified by the specification unit 1014. Details of this process will be described later. When this process ends, the series of processes for obtaining the document information ends, and the process proceeds to step S1105.
[0227] 次に、図 19のフローチャートにおける検索結果の示す各文献情報の全部または一 部を取得する処理にっ 、て、図 20のフローチャートを用いて説明する。  Next, the process of acquiring all or part of each piece of document information indicated by the search result in the flowchart of FIG. 19 will be described using the flowchart of FIG.
[0228] (ステップ S1401)文献情報取得部 1015は、カウンタ iを 1に設定する。  (Step S1401) The literature information acquisition unit 1015 sets the counter i to 1.
(ステップ S1402)文献情報取得部 1015は、 i番目の検索結果の示す文献情報の 全部または一部を取得する。  (Step S1402) The document information acquisition unit 1015 acquires all or part of the document information indicated by the i-th search result.
[0229] (ステップ S1403)文献情報取得部 1015は、ステップ S 1402で取得した文献情報 の全部または一部を図示しな 、記録媒体にぉ 、て一時記憶する。この記録媒体は、 文献情報取得部 1015が有してもよぐ他の構成要素が有してもよい。  (Step S 1403) The document information acquisition unit 1015 temporarily stores all or part of the document information acquired in step S 1402 on a recording medium, not shown. This recording medium may be included in other constituent elements that the document information acquisition unit 1015 may have.
(ステップ S1404)文献情報取得部 1015は、カウンタ iを 1だけインクリメントする。  (Step S1404) The literature information acquisition unit 1015 increments the counter i by 1.
[0230] (ステップ S1405)文献情報取得部 1015は、 i番目の検索結果が存在するかどうか 判断する。そして、存在する場合には、ステップ S 1402に戻り、そうでない場合には、 検索結果の示す各文献情報の全部または一部を取得する一連の処理は終了となり 、ステップ S 1302に進む。  (Step S 1405) The document information acquisition unit 1015 determines whether or not the i-th search result exists. If it exists, the process returns to step S 1402. If not, the series of processes for acquiring all or part of each document information indicated by the search result ends, and the process proceeds to step S 1302.
[0231] 次に、図 18のフローチャートにおける特定された文献情報の全部または一部の取 得処理について、図 21のフローチャートを用いて説明する。 [0231] Next, all or part of the specified document information in the flowchart of FIG. The obtaining process will be described with reference to the flowchart of FIG.
(ステップ S1501)文献情報取得部 1015は、カウンタ iを 1に設定する。  (Step S1501) The literature information acquisition unit 1015 sets the counter i to 1.
[0232] (ステップ S1502)文献情報取得部 1015は、特定部 1014によって特定された i番 目の文献情報の全部または一部をまだ取得していないかどうか判断する。そして、取 得していない場合には、ステップ S 1503に進み、取得している場合には、ステップ S 1505に進む。この判断は、例えば、ステップ S1403で文献情報取得部 1015によつ て取得された文献情報の一時記憶されて ヽる図示しな ヽ記録媒体を参照すること〖こ よって行うことができる。 (Step S 1502) The document information acquiring unit 1015 determines whether all or a part of the i-th document information specified by the specifying unit 1014 has not yet been acquired. If it has not been acquired, the process proceeds to step S 1503, and if it has been acquired, the process proceeds to step S 1505. This determination can be made, for example, by referring to a recording medium (not shown) in which the document information acquired by the document information acquisition unit 1015 in step S1403 is temporarily stored.
[0233] (ステップ S1503)文献情報取得部 1015は、特定部 1014によって特定された i番 目の文献情報の全部または一部を取得する。  (Step S 1503) The document information acquisition unit 1015 acquires all or part of the i-th document information specified by the specifying unit 1014.
(ステップ S1504)文献情報取得部 1015は、ステップ S 1503で取得した文献情報 の全部または一部を図示しな 、記録媒体にぉ 、て一時記憶する。この記録媒体は、 文献情報取得部 1015が有してもよぐ他の構成要素が有してもよい。  (Step S1504) The document information acquisition unit 1015 temporarily stores all or part of the document information acquired in Step S1503 on a recording medium, not shown. This recording medium may be included in other constituent elements that the document information acquisition unit 1015 may have.
[0234] (ステップ S1505)文献情報取得部 1015は、カウンタ iを 1だけインクリメントする。  (Step S1505) The literature information acquisition unit 1015 increments the counter i by 1.
(ステップ S 1506)文献情報取得部 1015は、特定部 1014によって特定された i番 目の文献情報が存在するかどうか判断する。そして、存在する場合には、ステップ S1 502に戻り、そうでない場合には、特定された文献情報を取得する一連の処理は終 了となり、ステップ S 1105〖こ進む。  (Step S 1506) The document information acquiring unit 1015 determines whether or not the i-th document information specified by the specifying unit 1014 exists. If it exists, the process returns to step S1502. If not, the series of processes for acquiring the specified document information ends, and the process proceeds to step S1105.
[0235] 次に、文献情報取得部 1015が取得した文献情報の全部または一部力も技術用語 を抽出する方法について説明する。  [0235] Next, a method of extracting technical terms from all or part of the document information acquired by the document information acquisition unit 1015 will be described.
まず、技術用語を抽出する基本的な原理について説明する。文献情報取得部 101 5が取得した文献情報の全部または一部には、検索や特定されたメイントピックの分 野における技術用語が含まれていると仮定している。そして、その技術用語は一般的 に、所定の手力 Sかり語句と対応していることが多いと考えられる。特に、文献情報取 得部 1015が取得した文献情報の全部または一部にメイントピックの名称も含まれる 場合には、技術用語は、メイントピックの名称と所定の手力 Sかり語句を介して対応付 けられて 、ることが多!、と考えられる。  First, the basic principle of extracting technical terms will be described. It is assumed that all or part of the document information acquired by the document information acquisition unit 1015 includes technical terms in the field of the main topic that has been searched and identified. In general, the technical terms are likely to correspond to a specific phrase. In particular, if the name of the main topic is also included in all or part of the document information acquired by the document information acquisition unit 1015, the technical term corresponds to the name of the main topic via a predetermined effort. There are many things that are attached!
[0236] 例えば、メイントピックの名称力 ^文抽出」である場合に、文献情報取得部 1015が 文献情報の一部「SVMを用いた文抽出」を取得したとすると、メイントピックの名称「 文抽出」と、その分野における技術用語「SVM^が、手力かり語句「を用いた」を介 して対応付けられていることがわかる。また、例えば、メイントピックの名称力 ^字幕生 成」である場合に、文献情報取得部 1015が文献情報の一部「字幕生成のための文 短縮」を取得したとすると、メイントピックの名称「字幕生成」と、その分野における技 術用語「文短縮」とが、手がかり語句「のための」を介して対応付けられて ヽることがわ かる。また、文献情報取得部 1015が取得した文献情報の全部または一部にメイント ピックの名称が含まれない場合であっても、手力 Sかり語句を用いることによって、技術 用語を特定することは可能であると考えられる。 [0236] For example, in the case of "name of main topic ^ sentence extraction", the document information acquisition unit 1015 Assuming that part of the literature information “sentence extraction using SVM” is acquired, the name of the main topic “sentence extraction” and the technical term “SVM ^” in that field are used via the clue phrase “using”. It can be seen that they are associated with each other. Also, for example, if the main topic name power ^ caption generation ", and the document information acquisition unit 1015 acquires a part of the document information" Short sentence for subtitle generation ", the main topic name" It can be seen that “caption generation” is associated with the technical term “sentence shortening” in the field via the clue phrase “for”. Even if all or part of the document information acquired by the document information acquisition unit 1015 does not include the name of the main topic, it is possible to specify technical terms by using hand-powered phrases. It is thought that.
[0237] したがって、それらの手が力り語句を特定し、その手力かり語句で文献情報取得部 1015が取得した文献情報の全部または一部を検索し、その検索された手が力り語 句に対応する用語を技術用語として抽出することができうる。なお、それだけの方法 であれば、適切でない技術用語を抽出する可能性もありうるため、発明者らは、より精 度を上げるための方策についても考えている。以下、具体的に、技術用語抽出部 10 16の構成及び動作について説明する。  [0237] Therefore, those hands specify a powerful phrase, and the literature information acquisition unit 1015 searches for all or a part of the literature information using the clue phrase, and the searched hand is a powerful word. A term corresponding to a phrase can be extracted as a technical term. Note that with such a method, it is possible that technical terms that are not appropriate may be extracted, so the inventors are also considering measures to improve accuracy. The configuration and operation of the technical term extraction unit 10 16 will be specifically described below.
[0238] [パターン 1]  [0238] [Pattern 1]
図 22は、パターン 1における技術用語抽出部 1016の構成を示すブロック図である 。図 22において、技術用語抽出部 1016は、手がかり語句記憶手段 1021と、手がか り語句検索手段 1022と、技術用語抽出手段 1023とを備える。なお、文献情報取得 部 1015が取得した文献情報の全部または一部は、文献情報一時記憶手段 1020で 記憶されているものとする。この文献情報一時記憶手段 1020は、例えば、文献情報 取得部 1015が有してもよぐ技術用語抽出部 1016が有してもよぐあるいは、その 他の構成要素が有してもょ ヽ。  FIG. 22 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 1. In FIG. 22, the technical term extraction unit 1016 includes a clue phrase storage unit 1021, a clue phrase search unit 1022, and a technical term extraction unit 1023. It should be noted that all or part of the document information acquired by the document information acquisition unit 1015 is stored in the document information temporary storage unit 1020. The document information temporary storage means 1020 may be included in the technical term extraction unit 1016, which may be included in the document information acquisition unit 1015, or may be included in other constituent elements, for example.
[0239] 手力かり語句記憶手段 1021では、 1以上の手力かり語句が記憶される。この手が 力り語句は、上記説明のように、技術用語の抽出で用いられるための手が力り語句で あり、適切な技術用語の抽出を行うことができるようにあら力じめ決められているもの である。なお、この手がかり語句は、実際には手が力り語句を示す情報であるが、単 に手がかり語句と呼ぶことにする。手がかり語句記憶手段 1021は、所定の記録媒体 (例えば、半導体メモリや磁気ディスク、光ディスクなど)によって実現されうる。手がか り語句記憶手段 1021に手が力り語句が記憶される過程は問わない。例えば、記録 媒体を介して手が力り語句が手が力り語句記憶手段 1021で記憶されるようになって もよぐ通信回線等を介して送信された手力 Sかり語句が手が力り語句記憶手段 1021 で記憶されるようになってもよぐあるいは、入力デバイスを介して入力された手がかり 語句が手力かり語句記憶手段 1021で記憶されるようになってもよい。 [0239] The clue word storage means 1021 stores one or more clue words. As described above, this hand-powered phrase is a hand-held phrase that is used in the extraction of technical terms, and is determined in advance so that appropriate technical terms can be extracted. It is what. Note that this clue word is information that actually indicates a hand-held word, but is simply called a clue word. The clue word storage means 1021 is a predetermined recording medium. (For example, it can be realized by a semiconductor memory, a magnetic disk, an optical disk, etc.). There is no limitation on the process in which words are stored by hand in the word storage means 1021. For example, hand-powered words transmitted through a communication line or the like may be stored in the word-and-phrase storage means 1021. The clue phrase storage means 1021 may store the clue word or phrase input via the input device, or the clue phrase storage means 1021 may store the clue phrase.
[0240] 手力かり語句検索手段 1022は、文献情報取得部 1015が取得した文献情報の全 部または一部において、手力 Sかり語句記憶手段 1021で記憶されている手が力り語 句を検索する。 [0240] The manual action phrase search means 1022 uses the hand stored in the manual action S phrase storage means 1021 in all or part of the literature information acquired by the literature information acquisition section 1015. Search for.
[0241] 技術用語抽出手段 1023は、手が力り語句検索手段 1022が検索した手が力り語 句と所定の関係を有する用語を技術用語として抽出する。手がかり語句と所定の関 係を有する用語とは、例えば、手力 Sかり語句の前にある用語や、手がかり語句の後に ある用語等である。この関係は、手が力り語句ごとに決まっていてもよぐ手がかり語 句の属性ごとに決まっていてもよい。手力 Sかり語句の属性とは、例えば、その手がかり 語句が技術用語とメイントピックとを関係付ける関係であってもよい。例えば、手がか り語句の属性として、「Method」や、「Goal」等がある。属性「Method」を有する手が 力り語句としては、例えば、「による」、「に基づく」、「を用いた」等が存在する。属性「 Goal」を有する手力かり語句としては、例えば、「のための」、「に向けて」等がある。技 術用語抽出手段 1023は、抽出した技術用語を図示しない記録媒体で一時記憶して ちょい。  [0241] The technical term extraction unit 1023 extracts, as the technical term, a term having a predetermined relationship with the hand-powered phrase searched by the hand-powered phrase searching unit 1022. The term having a predetermined relationship with the clue word / phrase is, for example, a term before the clue word / phrase or a term after the clue word / phrase. This relationship may be determined for each attribute of the clue word, even if it is determined for each word. The attribute of the clue word / phrase may be, for example, a relationship in which the clue word / phrase relates the technical term to the main topic. For example, there are “Method”, “Goal”, and the like as attributes of the clue word. For example, “by”, “based on”, “using”, and the like exist as hand-written words having the attribute “Method”. Examples of clue words having the attribute “Goal” include “for” and “toward”. The technical term extraction means 1023 temporarily stores the extracted technical terms in a recording medium (not shown).
[0242] 技術用語抽出手段 1023は、例えば、手力かり語句と所定の関係を有する連続した 名詞句を技術用語として抽出してもよぐあるいは、あら力じめ技術用語となりうる用 語の候補を図示しない記録媒体において保持しておき、手がかり語句と所定の関係 を有する用語であって、その保持して 、る技術用語の候補と一致するものを技術用 語として抽出してもよい。  [0242] The technical term extraction means 1023 may extract, for example, a continuous noun phrase having a predetermined relationship with a clue word phrase as a technical term, or a candidate for a term that can be used as a technical term. May be stored in a recording medium (not shown), and a term that has a predetermined relationship with the clue word and that matches the candidate technical term may be extracted as a technical term.
[0243] 次に、パターン 1における技術用語抽出部 1016の動作について、図 23のフローチ ヤートを用いて説明する。この図 23のフローチャートの処理は、図 17のフローチヤ一 トにおける技術用語の抽出処理 (ステップ S 1105)に対応する。 (ステップ S1601)技術用語抽出部 1016は、カウンタ iを 1に設定し、カウンタ jを 1に 設定する。 [0243] Next, the operation of the technical term extraction unit 1016 in Pattern 1 will be described using the flowchart in FIG. The process of the flowchart of FIG. 23 corresponds to the technical term extraction process (step S 1105) in the flowchart of FIG. (Step S1601) The technical term extraction unit 1016 sets the counter i to 1, and sets the counter j to 1.
[0244] (ステップ S1602)手が力り語句検索手段 1022は、文献情報取得部 1015が取得 した i番目の文献情報の全部または一部において、手がかり語句記憶手段 1021で 記憶されている j番目の手力 Sかり語句を検索する。そして、検索された場合には、ステ ップ S1603に進み、そうでな ヽ場合に ίま、ステップ S1606に進む。  [0244] (Step S1602) The hand-powered phrase search unit 1022 stores the j-th stored in the clue phrase storage unit 1021 in all or part of the i-th document information acquired by the document information acquisition unit 1015. Search for words and phrases. If it is found, the process proceeds to step S1603. If not, the process proceeds to step S1606.
[0245] (ステップ S1603)技術用語抽出手段 1023は、検索された手が力り語句と所定の 関係にある用語を技術用語として抽出する。技術用語抽出手段 1023は、抽出した 技術用語を図示しな 、記録媒体で一時記憶してもよ 、。  [0245] (Step S1603) The technical term extraction means 1023 extracts, as a technical term, a term in which the searched hand has a predetermined relationship with the force phrase. The technical term extraction means 1023 may temporarily store the extracted technical terms in a recording medium without illustration.
[0246] (ステップ S1604)技術用語抽出部 1016は、カウンタ iを 1だけインクリメントし、カウ ンタ jを 1に設定する。  (Step S1604) The technical term extraction unit 1016 increments the counter i by 1, and sets the counter j to 1.
(ステップ S1605)技術用語抽出部 1016は、文献情報取得部 1015が取得した i番 目の文献情報が存在するかどうか判断する。そして、存在する場合には、ステップ S1 602に戻り、そうでない場合には、技術用語を抽出する一連の処理は終了となり、ス テツプ S 1106に進む。  (Step S1605) The technical term extraction unit 1016 determines whether or not the i-th document information acquired by the document information acquisition unit 1015 exists. If it exists, the process returns to step S1 602. If not, the series of processes for extracting technical terms ends, and the process proceeds to step S1106.
[0247] (ステップ S1606)技術用語抽出部 1016は、カウンタ jを 1だけインクリメントする。  (Step S1606) The technical term extraction unit 1016 increments the counter j by 1.
(ステップ S1607)技術用語抽出部 1016は、手が力り語句記憶手段 1021に j番目 の手力 Sかり語句が存在するかどうか判断する。そして、存在する場合には、ステップ S 1602【こ戻り、そうでな ヽ場合【こ ίま、ステップ S 1604【こ進む。  (Step S1607) The technical vocabulary extraction unit 1016 determines whether or not the hand-powered phrase storage unit 1021 includes the j-th manual phrase S phrase. If it exists, go back to step S1602, and if not, go to step S1604.
[0248] なお、このフローチャートでは、手が力り語句のみを用いて技術用語の抽出を行う 場合について説明したが、手が力り語句と共に、検索用語を用いて技術用語の抽出 を行ってもよい。そのようにすることで、技術用語の抽出精度を向上させることができう る。例えば、文献情報の一部「SVMを用いた文抽出」から技術用語を抽出する場合 に、検索用語が「文抽出」であれば、その検索用語「文抽出」と手力 Sかり語句「を用い た」で対応付けられる用語「SVM」を適切に抽出することができる。一方、検索用語 が「SVM」であれば、抽出すべき用語 (手力かり語句「を用いた」の前の用語)と、検 索用語とが同じになるため、不適切であるとして、その用語の抽出を行わないように することができる。なお、検索部 1013が検索した検索用語を含む文献情報の領域が 、文献情報取得部 1015によって少なくとも取得された場合には、検索用語を用いた 技術用語の抽出を行うことができうるが、検索部 1013が検索した検索用語を含む文 献情報の領域が、文献情報取得部 1015によって取得されな力つた場合や、特定部 1014が特定した文献情報であって、検索用語を含まない文献情報が取得された場 合には、検索用語を用いた技術用語の抽出を行うことができないこともある。なお、検 索用語を用いた技術用語の抽出を行ってもょ 、ことは、以下のパターンにお!/、ても 同様である。 [0248] In this flowchart, the case is described in which a hand extracts a technical term using only a force word, but even if a hand extracts a technical term using a search term together with the force word. Good. By doing so, the extraction accuracy of technical terms can be improved. For example, if a technical term is extracted from a part of the literature information “sentence extraction using SVM” and the search term is “sentence extraction”, the search term “sentence extraction” and the manual search term “ The term “SVM” associated with “used” can be appropriately extracted. On the other hand, if the search term is “SVM”, the term to be extracted (the term before the clue phrase “using”) and the search term are the same, so it is considered inappropriate. Term extraction can be avoided. Note that the document information area including the search term searched by the search unit 1013 is If at least it is acquired by the document information acquisition unit 1015, it is possible to extract the technical terms using the search terms, but the document information area including the search terms searched by the search unit 1013 Extraction of technical terms using search terms when the information acquisition unit 1015 does not acquire it or when the reference information specified by the specification unit 1014 does not include the search terms. May not be possible. It should be noted that technical terms can be extracted using search terms, and the same applies to the following patterns!
このようにして、技術用語の抽出を行うことができる。  In this way, technical terms can be extracted.
[0249] [パターン 2]  [0249] [Pattern 2]
図 24は、パターン 2における技術用語抽出部 1016の構成を示すブロック図である 。図 24において、技術用語抽出部 1016は、手がかり語句記憶手段 1021と、手がか り語句検索手段 1022と、専門用語記憶手段 1024と、技術用語抽出手段 1025とを 備える。なお、専門用語記憶手段 1024、及び技術用語抽出手段 1025以外の構成 及び動作は、パターン 1と同様であり、その説明を省略する。  FIG. 24 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 2. In FIG. 24, the technical term extraction unit 1016 includes clue phrase storage means 1021, clue phrase search means 1022, technical term storage means 1024, and technical term extraction means 1025. The configurations and operations other than the technical term storage unit 1024 and the technical term extraction unit 1025 are the same as those in the pattern 1, and the description thereof is omitted.
[0250] 専門用語記憶手段 1024では、 2以上の専門用語が記憶される。ここで、専門用語 とは、 1以上の研究分野における専門的な用語であって、技術用語となりうる用語で ある。この専門用語は、実際には専門用語を示す情報であるが、単に専門用語と呼 ぶことにする。この専門用語は、例えば、人によって決定されたものであってもよぐ 複数の技術文献から機械的に取得されたものであってもよい。後者の場合には、例 えば、技術文献を形態素解析することにより、単名詞等を抽出し、各単名詞等につい て専門用語である可能性を示すスコア付けを行い、高いスコアを付与された単名詞 等を専門用語として取得してもよい。ここで、スコアを付与する方法として、造語能力 に基づくスコア付け、出現頻度に基づくスコア付け等の複数の種類が知られて 、る。 また、それ以外の方法を用いてもよい。専門用語を取得する方法については、下記 の複数の文献等において開示されており、従来力も知られているため、その詳細な 説明を省略する。また、専門用語リストを作成するツールとして、 TermExtractも公 開 れて ヽる (http : Z Z gensen. di. itc. u— tokyo. ac. jpZ termextract. htm 1)。 [0251] 文献: Hiroshi Nakagawa、「Automatic Term Recognition based on S tatistics of Compound NounsJ、 Terminology、 Vol. 6、 No. 2、 p. 195— 2 10、 2000 [0250] The technical term storage means 1024 stores two or more technical terms. Here, technical terms are terms that can be technical terms in one or more research fields and can be technical terms. This terminology is actually information indicating a terminology, but is simply called a terminology. This technical term may be determined by a person or may be obtained mechanically from a plurality of technical documents. In the latter case, for example, singular nouns etc. are extracted by morphological analysis of technical literature, scoring is given to indicate the possibility that each noun etc. is a technical term, and a high score is given. Single nouns may be acquired as technical terms. Here, as a method for assigning a score, a plurality of types such as scoring based on word-making ability and scoring based on appearance frequency are known. Further, other methods may be used. The method for obtaining technical terms is disclosed in the following documents and the like, and its conventional power is also known. TermExtract is also available as a tool for creating a terminology list (http: ZZ gensen. Di. Itc. U—tokyo. Ac. JpZ termextract. Htm 1). [0251] Literature: Hiroshi Nakagawa, “Automatic Term Recognition based on Statistics of Compound NounsJ, Terminology, Vol. 6, No. 2, p. 195— 2 10, 2000
[0252] 文献:大畑博一、中川裕志、「連接異なり語数による専門用語抽出」、情報処理学 会研究報告、 2000-NL- 136, p. 119 - 126  [0252] Article: Hirokazu Ohata, Hiroshi Nakagawa, “Extraction of technical terms based on the number of connected words,” Information Processing Society of Japan, 2000-NL-136, p. 119-126
[0253] 文献:中川裕志、森辰則、湯本紘彰、「出現頻度と連接頻度に基づく専門用語抽出 」、自然言語処理、 Vol. 10 No. l、p. 27— 45、 2003年 1月  [0253] Literature: Hiroshi Nakagawa, Masanori Mori, Yasuaki Yumoto, “Extraction of technical terms based on appearance frequency and connection frequency”, Natural Language Processing, Vol. 10 No. l, p. 27-45, January 2003
[0254] なお、専門用語記憶手段 1024は、所定の記録媒体 (例えば、半導体メモリや磁気 ディスク、光ディスクなど)によって実現されうる。また、専門用語記憶手段 1024に専 門用語が記憶される過程は問わない。例えば、記録媒体を介して専門用語が専門 用語記憶手段 1024で記憶されるようになってもよぐ通信回線等を介して送信され た専門用語が専門用語記憶手段 1024で記憶されるようになってもよぐあるいは、入 力デバイスを介して入力された専門用語が専門用語記憶手段 1024で記憶されるよ うになつてもよい。  Note that the technical term storage means 1024 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.). Further, the process of storing the technical terms in the technical term storage means 1024 is not limited. For example, technical terms transmitted via a communication line or the like may be stored in the technical term storage means 1024 even if the technical terms are stored in the technical term storage means 1024 via the recording medium. Alternatively, the terminology input via the input device may be stored in the terminology storage means 1024.
[0255] 技術用語抽出手段 1025は、手が力り語句検索手段 1022が検索した手が力り語 句と所定の関係を有する用語に、専門用語記憶手段 1024で記憶されている専門用 語が含まれる場合に、その専門用語を技術用語として抽出する。このようにすること で、専門用語記憶手段 1024で記憶されている専門用語以外は技術用語として抽出 されないことになるため、技術用語抽出手段 1025が「研究」や「手法」といった一般 名詞、「提案手法」のような非専門用語等を抽出する事態を回避することができうる。  [0255] The technical term extraction means 1025 uses the terminology stored in the technical term storage means 1024 as a term having a predetermined relationship with the hand-strength phrase searched by the hand-strength phrase search means 1022. If included, the technical term is extracted as a technical term. In this way, technical terms other than the technical terms stored in the technical term storage means 1024 are not extracted as technical terms. The situation of extracting non-technical terms such as “method” can be avoided.
[0256] 次に、パターン 2における技術用語抽出部 1016の動作について、図 25のフローチ ヤートを用いて説明する。この図 25のフローチャートの処理は、図 17のフローチヤ一 トにおける技術用語の抽出処理 (ステップ S 1105)に対応する。なお、このフローチヤ ートにおいて、ステップ S 1701〜S1706以外の処理は、技術用語抽出手段 1023が 技術用語抽出手段 1025となった以外、図 23のフローチャートと同様であり、その説 明を省略する。  [0256] Next, the operation of the technical term extraction unit 1016 in Pattern 2 will be described using the flowchart in FIG. The process of the flowchart of FIG. 25 corresponds to the technical term extraction process (step S 1105) in the flowchart of FIG. In this flowchart, the processes other than steps S 1701 to S 1706 are the same as those in the flowchart of FIG. 23 except that the technical term extracting means 1023 is replaced with the technical term extracting means 1025, and the description thereof is omitted.
[0257] (ステップ S1701)技術用語抽出手段 1025は、検索された手が力り語句と所定の 関係にある用語を抽出する。 (ステップ S1702)技術用語抽出手段 1025は、カウンタ kを 1に設定する。 [0257] (Step S1701) The technical term extraction means 1025 extracts a term in which the searched hand has a predetermined relationship with the force phrase. (Step S1702) The technical term extraction means 1025 sets the counter k to 1.
[0258] (ステップ S1703)技術用語抽出手段 1025は、ステップ S1701で抽出した用語に 、専門用語記憶手段 1024で記憶されている k番目の専門用語が含まれるかどうか判 断する。そして、含まれる場合には、ステップ S1704に進み、そうでない場合には、ス テツプ S 1705に進む。 (Step S1703) The technical term extraction means 1025 determines whether or not the term extracted in step S1701 includes the k-th technical term stored in the technical term storage means 1024. If it is included, the process proceeds to step S1704. If not, the process proceeds to step S1705.
[0259] (ステップ S1704)技術用語抽出手段 1025は、ステップ S1701で抽出した用語に 含まれる k番目の専門用語を技術用語として抽出する。  [0259] (Step S1704) The technical term extraction means 1025 extracts the k-th technical term included in the term extracted in step S1701 as a technical term.
(ステップ S1705)技術用語抽出手段 1025は、カウンタ kを 1だけインクリメントする  (Step S1705) The technical term extraction means 1025 increments the counter k by 1.
[0260] (ステップ S1706)技術用語抽出手段 1025は、 k番目の専門用語が専門用語記憶 手段 1024に存在するかどうか判断する。そして、存在する場合には、ステップ S170 3に戻り、そうでない場合には、ステップ S1604に進む。 [0260] (Step S1706) The technical term extraction means 1025 determines whether or not the k-th technical term exists in the technical term storage means 1024. If it exists, the process returns to step S1703, and if not, the process proceeds to step S1604.
このようにして、技術用語の抽出を行うことができる。  In this way, technical terms can be extracted.
[0261] [パターン 3]  [0261] [Pattern 3]
図 26は、パターン 3における技術用語抽出部 1016の構成を示すブロック図である 。このパターンでの技術用語抽出部 1016は、技術用語候補を抽出し、その技術用 語候補力も技術用語を選択するものである。図 26において、技術用語抽出部 1016 は、手がかり語句記憶手段 1021と、手が力り語句検索手段 1022と、技術用語候補 抽出手段 1026と、技術用語選択手段 1027とを備える。なお、技術用語候補抽出手 段 1026、及び技術用語選択手段 1027以外の構成及び動作は、パターン 1と同様 であり、その説明を省略する。  FIG. 26 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 3. The technical term extraction unit 1016 in this pattern extracts technical term candidates, and the technical term candidate power also selects technical terms. In FIG. 26, the technical term extraction unit 1016 includes clue word / phrase storage means 1021, hand / phrase word / phrase search means 1022, technical term candidate extraction means 1026, and technical term selection means 1027. The configuration and operation other than the technical term candidate extraction means 1026 and the technical term selection means 1027 are the same as those in pattern 1 and will not be described.
[0262] 技術用語選択手段 1027は、手が力り語句検索手段 1022が検索した手が力り語 句と所定の関係を有する用語を、技術用語候補として抽出する。この技術用語候補 の抽出方法は、パターン 1、 2で説明した技術用語の抽出方法と同様であり、その説 明を省略する。  [0262] The technical term selection unit 1027 extracts, as technical term candidates, terms having a predetermined relationship with the hand-powered phrase searched by the hand-powered phrase searching unit 1022. This technical term candidate extraction method is the same as the technical term extraction method described in patterns 1 and 2, and a description thereof will be omitted.
[0263] 技術用語選択手段 1027は、技術用語候補抽出手段 1026が抽出した技術用語候 補の全部または一部を技術用語として選択する。具体的な選択方法については、パ ターン 4, 5において後述する。 [0264] 次に、パターン 3における技術用語抽出部 1016の動作について、図 27のフローチ ヤートを用いて説明する。この図 27のフローチャートの処理は、図 17のフローチヤ一 トにおける技術用語の抽出処理 (ステップ S1105)に対応する。なお、このフローチヤ ートにおいて、ステップ S1801、 S1802以外の処理は、図 23のフローチャートと同様 であり、その説明を省略する。 [0263] The technical term selection means 1027 selects all or part of the technical term candidates extracted by the technical term candidate extraction means 1026 as technical terms. The specific selection method will be described later in patterns 4 and 5. [0264] Next, the operation of the technical term extraction unit 1016 in Pattern 3 will be described using the flowchart in FIG. The process of the flowchart of FIG. 27 corresponds to the technical term extraction process (step S1105) in the flowchart of FIG. In this flow chart, the processes other than steps S1801 and S1802 are the same as those in the flowchart of FIG.
[0265] (ステップ S1801)技術用語候補抽出手段 1026は、検索された手が力り語句と所 定の関係にある用語を技術用語候補として抽出する。技術用語候補抽出手段 1026 は、抽出した技術用語候補を図示しな 、記録媒体で一時記憶してもよ 、。  [0265] (Step S1801) The technical term candidate extraction means 1026 extracts a term in which the searched hand has a predetermined relationship with the strong phrase as a technical term candidate. The technical term candidate extraction means 1026 may temporarily store the extracted technical term candidates on a recording medium, not shown.
[0266] (ステップ S1802)技術用語選択手段 1027は、技術用語選択手段 1027が選択し た技術用語候補から、技術用語を選択する。技術用語選択手段 1027は、選択した 技術用語を図示しな 、記録媒体で一時記憶してもよ 、。  (Step S 1802) Technical term selection means 1027 selects a technical term from the technical term candidates selected by technical term selection means 1027. The technical term selection means 1027 may temporarily store the selected technical term in a recording medium without illustration.
[0267] ステップ S1802において、技術用語選択手段 1027は、例えば、文献情報記憶部 1012にお 、て記憶されて 、る 、ずれの文献情報にお!、てもメイントピックとして用い られていない技術用語候補を、技術用語として選択せず、いずれかの文献情報にお V、てメイントピックとして用いられて 、る技術用語候補を、技術用語として選択してもよ い。技術用語として選択される用語は、いずれかの技術文献においては、メイントピッ クとして扱われて 、る可能性が高 、と考えられるからである。  [0267] In step S1802, the technical term selection means 1027 stores, for example, technical terms that are stored in the literature information storage unit 1012, but are not used as the main topic even if they are in the misplaced literature information! Candidates may not be selected as technical terms, but technical terms that are used as main topics in any document information may be selected as technical terms. This is because a term selected as a technical term is considered to be treated as a main topic in any technical document and is considered to be highly likely.
[0268] なお、その具体的な方法としては、例えば、あら力じめ複数の文献情報力もメイント ピックを示す用語を収集して記録媒体に記憶しておき、技術用語候補のうち、記録媒 体で記憶されて 、る 、ずれかのメイントピックを示す用語と一致したものを技術用語と して選択してもよく、あるいは、技術用語候補がメイントピックを示す用語として使用さ れているかどうかを、文献情報記憶部 1012において記憶されている複数の文献情 報を検索することにより判断してもよい。ある用語が、文献情報においてメイントピック を示す用語として使用されているかどうかの判断は、例えば、検索用語を用いた検索 部 1013での検索の処理で説明したようにして行うことができる。具体的には、ある用 語が所定の回数以上、文献情報において記載されている場合に、その用語がメイン トピックであると判断してもよぐあるいは、ある用語が所定の複数の領域 (例えば、タ ィトルと要約など)に記載されている場合に、その用語がメイントピックであると判断し てもよい。それら以外の方法で判断を行ってもよいことは言うまでもない。例えば、文 献情報の構造を解析し、メイントピックが記載される領域に存在する用語がメイントビ ックであると判断してもよい。また、技術用語選択手段 1027による技術用語の選択の 方法は、これに限定されるものではなぐそれ以外の方法であってもよい。それ以外 の方法については、パターン 4, 5において後述する。 [0268] Note that, as a specific method, for example, terms indicating the main topic are also collected and stored in the recording medium, and among the technical term candidates, the recording medium is selected. May be selected as a technical term that matches one of the main topic terms, or whether a technical term candidate is used as a main topic term. Alternatively, the determination may be made by searching a plurality of document information stored in the document information storage unit 1012. Whether or not a certain term is used as a term indicating the main topic in the literature information can be determined as described in the search processing in the search unit 1013 using the search term, for example. Specifically, if a term is described in the literature information a predetermined number of times or more, it may be determined that the term is a main topic, or a term is defined in a plurality of predetermined areas (for example, , Title and summary, etc.), the term is considered the main topic. May be. Needless to say, the determination may be made by other methods. For example, the structure of the document information may be analyzed to determine that the term existing in the area where the main topic is described is the main topic. Further, the method of selecting technical terms by the technical term selecting means 1027 is not limited to this and may be other methods. Other methods will be described later in patterns 4 and 5.
このようにして、技術用語候補から技術用語を選択することにより、技術用語の抽出 を行うことができる。  In this manner, the technical terms can be extracted by selecting the technical terms from the technical term candidates.
[0269] [パターン 4]  [0269] [Pattern 4]
図 28は、パターン 4における技術用語抽出部 1016の構成を示すブロック図である 。ここでは、パターン 3の技術用語抽出部 1016における技術用語の選択の一例につ いて説明する。図 28において、技術用語抽出部 1016は、手が力り語句記憶手段 10 21と、手がかり語句検索手段 1022と、技術用語候補抽出手段 1026と、技術用語選 択手段 1027と、技術用語接尾語記憶手段 1028とを備える。技術用語接尾語記憶 手段 1028以外の構成及び動作は、パターン 3と同様であり、その説明を省略する。  FIG. 28 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 4. Here, an example of selecting technical terms in the technical term extraction unit 1016 for pattern 3 will be described. In FIG. 28, the technical term extraction unit 1016 includes a clue phrase storage unit 1021, a clue phrase search unit 1022, a technical term candidate extraction unit 1026, a technical term selection unit 1027, and a technical term suffix storage. Means 1028. The configuration and operation other than the technical term suffix storage means 1028 are the same as those in the pattern 3, and the description thereof is omitted.
[0270] 技術用語接尾語記憶手段 1028では、技術用語が取り得る 1以上の接尾語が記憶 される。この技術用語接尾語記憶手段 1028で記憶される接尾語は、技術用語が取 り得るものであり、例えば、「モデル」や、「法」、「アルゴリズム」等である。この接尾語 は、実際には接尾語を示す情報であるが、単に接尾語と呼ぶことにする。技術用語 接尾語記憶手段 1028は、所定の記録媒体 (例えば、半導体メモリや磁気ディスク、 光ディスクなど)によって実現されうる。技術用語接尾語記憶手段 1028に接尾語が 記憶される過程は問わない。例えば、記録媒体を介して接尾語が技術用語接尾語 記憶手段 1028で記憶されるようになってもよぐ通信回線等を介して送信された接 尾語が技術用語接尾語記憶手段 1028で記憶されるようになってもよぐあるいは、 入力デバイスを介して入力された接尾語が技術用語接尾語記憶手段 1028で記憶さ れるようになってもよい。  [0270] The technical term suffix storage means 1028 stores one or more suffixes that can be taken by the technical term. Suffixes stored in the technical term suffix storage means 1028 can be technical terms such as “model”, “law”, “algorithm”, and the like. This suffix is actually information indicating a suffix, but is simply called a suffix. Technical Term The suffix storage means 1028 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.). The technical term suffix storage means 1028 does not matter how the suffix is stored. For example, a suffix transmitted via a communication line or the like may be stored in the technical term suffix storage means 1028 even if the suffix is stored in the technical term suffix storage means 1028 via a recording medium. Alternatively, the suffix input via the input device may be stored in the technical term suffix storage means 1028.
なお、技術用語選択手段 1027は、技術用語接尾語記憶手段 1028で記憶されて いる接尾語を有する技術用語候補を技術用語として選択する。  The technical term selection unit 1027 selects a technical term candidate having the suffix stored in the technical term suffix storage unit 1028 as a technical term.
[0271] このパターンにおける技術用語抽出部 1016の動作は、図 27のフローチャートと同 様である。図 29は、図 27のフローチャートにおける技術用語の選択処理 (ステップ S 1802)の詳細を示すフローチャートである。 [0271] The operation of the technical term extraction unit 1016 in this pattern is the same as the flowchart of FIG. It is like. FIG. 29 is a flowchart showing details of the technical term selection process (step S 1802) in the flowchart of FIG. 27.
(ステップ S1901)技術用語選択手段 1027は、カウンタ iを 1に設定し、カウンタ jを 1 に設定する。  (Step S1901) The technical term selecting means 1027 sets the counter i to 1 and sets the counter j to 1.
[0272] (ステップ S1902)技術用語選択手段 1027は、技術用語候補抽出手段 1026が抽 出した i番目の技術用語候補が、技術用語接尾語記憶手段 1028で記憶されている j 番目の接尾語を有するかどうか判断する。そして、有する場合には、ステップ S1903 に進み、そうでない場合には、ステップ S 1906に進む。  [0272] (Step S1902) The technical term selection means 1027 uses the j th suffix stored in the technical term suffix storage means 1028 as the i th technical term candidate extracted by the technical term candidate extraction means 1026. Judge whether you have. If so, the process proceeds to step S1903; otherwise, the process proceeds to step S1906.
[0273] (ステップ S1903)技術用語選択手段 1027は、 i番目の技術用語候補を技術用語 として選択する。技術用語選択手段 1027は、選択した技術用語を図示しない記録 媒体で一時記憶してもよ!/、。  (Step S 1903) Technical term selection means 1027 selects the i-th technical term candidate as a technical term. The technical term selection means 1027 may temporarily store the selected technical term on a recording medium (not shown)!
(ステップ S1904)技術用語選択手段 1027は、カウンタ iを 1だけインクリメントし、力 ゥンタ jを 1に設定する。  (Step S1904) The technical term selection means 1027 increments the counter i by 1 and sets the force counter j to 1.
[0274] (ステップ S1905)技術用語選択手段 1027は、 i番目の技術用語候補が存在する カゝどうか判断する。そして、存在する場合には、ステップ S1902に戻り、存在しない場 合には、技術用語を選択する一連の処理は終了となり、ステップ S1106に進む。  [0274] (Step S1905) The technical term selecting means 1027 judges whether or not the i th technical term candidate exists. If it exists, the process returns to step S1902, and if it does not exist, the series of processes for selecting the technical term ends, and the process proceeds to step S1106.
(ステップ S1906)技術用語選択手段 1027は、カウンタ jを 1だけインクリメントする。  (Step S1906) The technical term selection means 1027 increments the counter j by 1.
[0275] (ステップ S1907)技術用語選択手段 1027は、技術用語接尾語記憶手段 1028に j番目の接尾語が存在するかどうか判断する。そして、存在する場合には、ステップ S 1902【こ戻り、そうでな ヽ場合【こ ίま、ステップ S 1904【こ進む。  (Step S 1907) The technical term selection unit 1027 determines whether or not the j th suffix exists in the technical term suffix storage unit 1028. If it exists, go back to step S 1902, and if not, go to step S 1904.
このようにして、技術用語候補から技術用語を選択することにより、技術用語の抽出 を行うことができる。  In this manner, the technical terms can be extracted by selecting the technical terms from the technical term candidates.
[0276] [パターン 5]  [0276] [Pattern 5]
図 30は、パターン 5における技術用語抽出部 1016の構成を示すブロック図である 。ここでは、パターン 3の技術用語抽出部 1016における技術用語の選択の一例につ いて説明する。図 30において、技術用語抽出部 1016は、手が力り語句記憶手段 10 21と、手がかり語句検索手段 1022と、技術用語候補抽出手段 1026と、技術用語選 択手段 1027と、技術用語接尾語記憶手段 1028と、文献情報特定手段 1029とを備 える。文献情報特定手段 1029以外の構成及び動作は、以下に説明する以外は、パ ターン 4と同様であり、その説明を省略する。 FIG. 30 is a block diagram showing the configuration of the technical term extraction unit 1016 in Pattern 5. Here, an example of selecting technical terms in the technical term extraction unit 1016 for pattern 3 will be described. In FIG. 30, the technical term extraction unit 1016 includes a clue phrase storage unit 1021, a clue phrase search unit 1022, a technical term candidate extraction unit 1026, a technical term selection unit 1027, and a technical term suffix storage. Means 1028 and literature information identification means 1029 Yeah. Except as described below, the configuration and operation other than the document information specifying unit 1029 are the same as those in the pattern 4, and the description thereof is omitted.
[0277] このパターン 5では、文献情報取得部 1015は、文献情報特定手段 1029の特定し た文献情報の全部または一部も取得する。また、技術用語選択手段 1027は、文献 情報取得部 1015が取得した一群の文献情報の全部または一部力 抽出された技 術用語候補の少なくともいずれかが、技術用語接尾語記憶手段 1028で記憶されて いる接尾語を有する場合に、その一群の文献情報の全部または一部から抽出された 技術用語候補のうち、所定の技術用語候補を技術用語として選択する。ここで、「一 群の文献情報」とは、検索部 1013によって検索された、あるいは、特定部 1014によ つて特定された一の文献情報 (これを「文献情報 N」とする)と、その一の文献情報と 分野が同一であると文献情報特定手段 1029によって特定された 1以上の文献情報 とを含む文献情報の集合である。また、技術用語選択手段 1027が技術用語として 選択する「所定の技術用語候補」とは、文献情報 Nの全部または一部力 抽出された 技術用語候補である。 [0277] In Pattern 5, the document information acquisition unit 1015 also acquires all or part of the document information specified by the document information specifying means 1029. Further, the technical term selection means 1027 stores at least one of the technical term candidates extracted from all or part of the group of literature information acquired by the literature information acquisition unit 1015 in the technical term suffix storage means 1028. If a technical term candidate is selected, a predetermined technical term candidate is selected as a technical term from technical term candidates extracted from all or part of the group of document information. Here, “a group of document information” refers to one document information searched by the search unit 1013 or specified by the specifying unit 1014 (this is referred to as “document information N”), This is a set of document information including one document information and one or more document information specified by the document information specifying means 1029 that the field is the same. The “predetermined technical term candidate” selected by the technical term selection means 1027 as the technical term is a technical term candidate in which all or part of the literature information N is extracted.
[0278] 文献情報特定手段 1029による文献情報の特定方法は、特定部 1014による文献 情報の特定方法と同様である。文献情報特定手段 1029は、例えば、文献情報 Nと、 引用関係にある文献情報を特定してもよい。ここで、引用関係とは、文献情報 Nが他 の文献情報を引用している関係であってもよぐ文献情報 Nが他の文献情報で引用 されている関係であってもよぐその両方を含む関係であってもよい。また、その引用 関係における弓 I用形式が、引用して 、る技術文献の問題を指摘する形式である問題 指摘型引用形式である文献情報を特定してもよい。この文献情報特定手段 1029の 具体的な動作については、特定部 1014の動作と同様である力、あるいは、特定部 1 014の動作力も容易に考えることができ、その詳細な説明を省略する。  [0278] The document information specifying method by the document information specifying means 1029 is the same as the document information specifying method by the specifying unit 1014. The document information specifying unit 1029 may specify, for example, document information N and document information having a citation relationship. Here, the citation relationship may be a relationship in which the document information N cites other document information or a relationship in which the document information N is cited in other document information. A relationship including Further, the document information that is a problem-pointing citation format in which the format for bow I in the citation relationship is a format for quoting and pointing out a problem of the technical document may be specified. Regarding the specific operation of this document information specifying means 1029, the force similar to the operation of the specifying unit 1014 or the operating force of the specifying unit 1 014 can be easily considered, and detailed description thereof will be omitted.
[0279] このパターンにおける技術用語抽出部 1016の動作は、図 27のフローチャートと同 様である。図 31は、図 27のフローチャートにおける技術用語の選択処理 (ステップ S 1802)の詳細を示すフローチャートである。なお、このフローチャートにおいて、ステ ップ S11001以外の処理は、図 29のフローチャートと同様であり、その説明を省略す る。 [0280] (ステップ SI 1001)技術用語選択手段 1027は、文献情報特定手段 1029によって 特定された文献情報を用いた技術用語候補の選択を行う。この処理の詳細について は後述する。 [0279] The operation of the technical term extraction unit 1016 in this pattern is the same as the flowchart of FIG. FIG. 31 is a flowchart showing details of the technical term selection process (step S 1802) in the flowchart of FIG. In this flowchart, the processes other than step S11001 are the same as those in the flowchart of FIG. 29, and the description thereof is omitted. (Step SI 1001) Technical term selection means 1027 selects technical term candidates using the literature information identified by literature information identification means 1029. Details of this processing will be described later.
[0281] 次に、図 31のフローチャートにおける技術用語候補の選択処理 (ステップ S11001 )の詳細な処理について、図 32のフローチャートを用いて説明する。  Next, detailed processing of the technical term candidate selection processing (step S11001) in the flowchart of FIG. 31 will be described using the flowchart of FIG.
[0282] (ステップ S11101)文献情報特定手段 1029は、 i番目の技術用語候補を含む文 献情報と分野が同一の文献情報を特定する。例えば、 i番目の技術用語候補を含む 文献情報にお 、て引用されて 、る文献情報であって、その引用形式力 Sタイプ Cであ る文献情報を特定する場合には、図 18のフローチャートで示される処理によって行う ことができうる。なお、文献情報特定手段 1029は、例えば、文献情報一時記憶手段 1020に文献情報を識別する文献 ID等が含まれる場合に、その文献 ID等を用いて、 潘目の技術用語候補を含む文献情報を特定することができうる。  [0282] (Step S11101) The document information specifying means 1029 specifies document information having the same field as the document information including the i-th technical term candidate. For example, when specifying document information that is cited in the document information including the i-th technical term candidate and that has the citation form power S type C, the flowchart of FIG. It can be done by the process shown in. Note that the document information specifying unit 1029, for example, if the document information temporary storage unit 1020 includes a document ID or the like for identifying the document information, the document information including the technical term candidate of the cell using the document ID or the like. Can be specified.
[0283] (ステップ S11102)文献情報取得部 1015は、文献情報特定手段 1029が特定し た各文献情報の全部または一部を取得する。この取得の処理は、例えば、ステップ S 1104の処理と同様にして行うことができる。なお、文献情報特定手段 1029が特定し た文献情報の全部または一部がすでに取得されて 、る場合には、文献情報取得部 1015は、その文献情報の全部または一部を、その取得された文献情報の全部また は一部が記憶されている記録媒体から取得してもよぐあるいは、文献情報記憶部 1 012力ら取得してもよい。  (Step S11102) The document information acquiring unit 1015 acquires all or part of each document information specified by the document information specifying unit 1029. This acquisition process can be performed in the same manner as the process of step S1104, for example. When all or a part of the document information specified by the document information specifying unit 1029 has already been acquired, the document information acquisition unit 1015 acquires all or a part of the document information. The document information may be acquired from a recording medium in which all or part of the document information is stored, or may be acquired from the document information storage unit 1012.
[0284] (ステップ S11103)技術用語候補抽出手段 1026は、ステップ S 11102で取得され た文献情報の全部または一部から、技術用語候補を抽出する。この抽出の処理は、 例えば、図 27のフローチャートのステップ S1802以前の処理と同様にして行うことが できる。  (Step S11103) Technical term candidate extraction means 1026 extracts technical term candidates from all or part of the literature information acquired in step S11102. This extraction process can be performed, for example, in the same manner as the process before step S1802 in the flowchart of FIG.
(ステップ S11104)技術用語選択手段 1027は、カウンタ mを 1に設定し、カウンタ nを 1に設定する。  (Step S11104) Technical term selecting means 1027 sets counter m to 1 and sets counter n to 1.
[0285] (ステップ S11105)技術用語選択手段 1027は、技術用語候補抽出手段 1026が ステップ S11103で抽出した n番目の技術用語候補が、技術用語接尾語記憶手段 1 028で記憶されている m番目の接尾語を有するかどうか判断する。そして、有する場 合【こ ίま、ステップ SI 1110【こ進み、そうでな ヽ場合【こ ίま、ステップ S11106【こ進む。 [0285] (Step S11105) The technical term selection means 1027 stores the mth technical term candidate extracted by the technical term candidate extraction means 1026 in step S11103 in the technical term suffix storage means 1 028. Determine if it has a suffix. And if you have If this is not the case, go to step SI 1110. If not, go to step SI1110.
[0286] (ステップ S11106)技術用語選択手段 1027は、カウンタ mを 1だけインクリメントす る。 (Step S11106) Technical term selecting means 1027 increments counter m by 1.
(ステップ S11107)技術用語選択手段 1027は、技術用語接尾語記憶手段 1028 に m番目の接尾語が存在するかどうか判断する。そして、存在する場合には、ステツ プ S 11105【こ戻り、そうでな ヽ場合【こ ίま、ステップ S 11108【こ進む。  (Step S11107) The technical term selection unit 1027 determines whether or not the m-th suffix exists in the technical term suffix storage unit 1028. If it exists, go back to step S11105, and if not, go to step S11108.
[0287] (ステップ S11108)技術用語選択手段 1027は、カウンタ ηを 1だけインクリメントし、 カウンタ mを 1に設定する。  (Step S11108) Technical term selecting means 1027 increments counter η by 1, and sets counter m to 1.
(ステップ S11109)技術用語選択手段 1027は、 n番目の技術用語候補が存在す るかどうか判断する。そして、存在する場合には、ステップ S11105に戻り、存在しな い場合には、特定された文献情報を用いて技術用語を選択する一連の処理は終了 となり、ステップ S 1904に進む。  (Step S11109) The technical term selecting means 1027 judges whether or not the nth technical term candidate exists. If it exists, the process returns to step S11105. If it does not exist, the series of processes for selecting a technical term using the specified document information ends, and the process proceeds to step S1904.
[0288] (ステップ SI 1110)技術用語選択手段 1027は、 i番目の技術用語候補を技術用 語として選択する。技術用語選択手段 1027は、選択した技術用語を図示しない記 録媒体で一時記憶してもよ 、。  (Step SI 1110) Technical term selecting means 1027 selects the i-th technical term candidate as a technical term. The technical term selection means 1027 may temporarily store the selected technical term on a recording medium (not shown).
[0289] なお、図 31,図 32のフローチャートでは、まず、文献情報 Nに関する技術用語候補 の選択を行い、その後に、文献情報 Nと分野が同一であると特定された文献情報に 関する技術用語候補の選択を行う場合について説明したが、これは一例であって、 結果として、文献情報 Nに関する技術用語候補の選択と、文献情報 Nと分野が同一 であると特定された文献情報に関する技術用語候補の選択とを行うことができるので あれば、その方法を問わない。  In the flowcharts of FIGS. 31 and 32, first, technical term candidates related to the document information N are selected, and then technical terms related to the document information identified as having the same field as the document information N. The case where candidate selection is performed has been described, but this is only an example. As a result, selection of technical term candidates related to document information N and technical terms related to document information identified as having the same field as document information N Any method can be used if it can be selected.
このようにして、技術用語候補から技術用語を選択することにより、技術用語の抽出 を行うことができる。  In this manner, the technical terms can be extracted by selecting the technical terms from the technical term candidates.
[0290] なお、上記のパターン 1〜5において、技術用語抽出部 1016が技術用語を抽出す るバリエーションについて説明したが、上記説明以外の方法を用いて技術用語を抽 出してもよいことは言うまでもない。なお、その抽出される技術用語は、検索部 1013 が検索した技術文献の分野において用いられる用語であり、その技術文献の分野に おける要素となる技術用語であることが好まし 、ことは前述の通りである。上記パター ン 1〜5以外の技術用語を抽出する方法としては、例えば、文献情報取得部 1015が 取得した文献情報の全部または一部を解析して、技術用語を抽出してもよい。文献 情報の全部または一部を解析して、技術用語を抽出する方法について簡単に説明 する。 [0290] In the above patterns 1 to 5, the technical term extraction unit 1016 explained the variation in which the technical term is extracted, but it goes without saying that the technical term may be extracted using a method other than the above explanation. Yes. The extracted technical term is a term used in the technical literature field searched by the search unit 1013, and is preferably a technical term that is an element in the technical literature field. Street. Above putter As a method of extracting technical terms other than 1 to 5, for example, the technical information may be extracted by analyzing all or part of the literature information acquired by the literature information acquisition unit 1015. Briefly explain how to extract technical terms by analyzing all or part of the literature information.
技術用語抽出部 1016は、図 33で示される構造タグと、手がかり語句とを対応付け る情報を保持しているとする。そして、文献情報取得部 1015によって、文献情報の 一部として「SVMを用いた文抽出」が取得されたとする。すると、技術用語抽出部 10 16は、取得された文献情報の一部「SVMを用いた文抽出」において、図 33で示さ れる手がかり語句を検索し、その手がカゝり語句を対応する構造タグに置き換える。す なわち、「SVMく Method手力かり語句 =を用いた〉文抽出」が得られる。次に、技術 用語抽出部 1016は、その構造タグの直前の文字列(英語の場合には、直後の文字 列となる)を、構造タグで挟む。すなわち、解析結果として、「く Method手力 Sかり語句 =を用いた〉 SVMく ZMethod〉文抽出」が得られる。技術用語抽出部 1016は、この 解析結果から「Method」タグに対応する用語「SVM」を、技術用語として抽出するこ とができる。なお、「Method」タグの場合には、そのタグで挟まれた用語を抽出すれ ばよいが、「Goal」タグの場合には、そのタグで挟まれている用語と、「Goal」の手が 力り語句によって対応付けられる用語 (通常は、「Goal」タグの後の用語となる)を抽 出することになる。例えば、文献情報取得部 1015が「ニュース番組における字幕生 成のための文短縮」を取得した場合には、構造解析の結果、「ニュース番組における く Goal手が力り語句 =のための〉字幕生成く ZGoal〉文短縮」となり、技術用語抽出 部 1016は、「Goal」タグで挟まれて ヽる用語「字幕生成」と「Goal」の手がかり語句「 のための」で対応付けられている用語「文短縮」を技術用語として抽出する。なお、図 33で記載されている以外の構造タグを用いてもよい。ただし、技術用語の抽出に用 いるのは、一般的に「Method」タグと、「Goal」タグである。また、技術用語抽出部 10 16は、検索用語を用いて技術用語の抽出を行ってもよい。例えば、解析の結果が「く Method手が力り語句 =を用いた〉 SVMく ZMethod〉文抽出」であり、検索用語が「 文抽出」である場合には、技術用語抽出部 1016は、その検索用語「文抽出」に対応 する用語であり、「Method」タグで挟まれている用語「SVM」を技術用語として抽出 する。一方、解析の結果が「く Method手力かり語句 =を用いた〉 SVMく ZMethod〉 文抽出」であり、検索用語が「SVM」である場合には、その検索用語「SVM」に対応 する用語であり、「Method」タグで挟まれている用語は存在しないため、技術用語は 抽出されないことになる。このように、検索用語も用いて技術用語の抽出を行うことに よって、技術用語を抽出する精度を上げることができうる。 It is assumed that the technical term extraction unit 1016 holds information associating the structure tag shown in FIG. 33 with the clue phrase. Then, it is assumed that “sentence extraction using SVM” is acquired by the document information acquisition unit 1015 as part of the document information. Then, the technical term extraction unit 10 16 searches the clue phrase shown in FIG. 33 in a part of the obtained document information “sentence extraction using SVM”, and the hand has a structure corresponding to the crawling phrase. Replace with a tag. In other words, “SVM Extraction using Skill Method clue phrase =” is obtained. Next, the technical term extraction unit 1016 sandwiches the character string immediately before the structure tag (in the case of English, the character string immediately after) between the structure tags. In other words, the result of the analysis is “SVM K ZMethod> Sentence Extraction” using “Method Method S phrasing =”. The technical term extraction unit 1016 can extract the term “SVM” corresponding to the “Method” tag from this analysis result as a technical term. In the case of the “Method” tag, the term sandwiched between the tags may be extracted. In the case of the “Goal” tag, the term sandwiched between the tag and the hand of “Goal” are extracted. You will extract the term (usually the term after the “Goal” tag) that is matched by the force phrase. For example, if the document information acquisition unit 1015 acquires “Sentence shortening for subtitle generation in a news program”, as a result of the structural analysis, the subtitle “ `` Generate ZGoal> Sentence shortened '', and the technical term extraction unit 1016 uses the term `` caption generation '' and `` Goal '' clue phrase `` for '' that are interleaved between `` Goal '' tags. "Sentence shortening" is extracted as a technical term. Note that structure tags other than those described in FIG. 33 may be used. However, the “Method” tag and “Goal” tag are generally used to extract technical terms. Further, the technical term extraction unit 10 16 may extract technical terms using the search terms. For example, if the result of the analysis is “Method Method uses force phrase => SVM / ZMethod> Sentence Extraction” and the search term is “Sentence Extraction”, the technical term extraction unit 1016 This is a term corresponding to the search term “sentence extraction”, and the term “SVM” sandwiched between “Method” tags is extracted as a technical term. To do. On the other hand, if the result of the analysis is “SVM Ku ZMethod> Sentence Extraction” using “K Method clue phrase =” and the search term is “SVM”, the term corresponding to the search term “SVM” Because there is no term sandwiched between “Method” tags, technical terms will not be extracted. Thus, by extracting technical terms using search terms as well, it is possible to improve the accuracy of extracting technical terms.
[0292] 次に、本実施の形態による技術動向情報作成装置 1001の動作について、具体例 を用いて説明する。この具体例において、技術動向情報作成装置 1001は、スタンド ァロンの装置であるとする。  [0292] Next, the operation of the technology trend information creation device 1001 according to the present embodiment will be described using a specific example. In this specific example, it is assumed that the technology trend information creation device 1001 is a stand-alone device.
[0293] この具体例において、文献情報記憶部 1012で記憶されている文献情報等は、図 3 4で示されるものであるとする。図 34において、技術文献を識別する文献 IDと、時期 情報と、文献情報とが対応付けられている。文献情報「D001」等は、各文献情報を 示すデータであり、各文献情報において、少なくともタイトルの領域を特定できるもの とする。  In this specific example, it is assumed that the document information and the like stored in the document information storage unit 1012 is as shown in FIG. In FIG. 34, a document ID for identifying a technical document, time information, and document information are associated with each other. The document information “D001” or the like is data indicating each document information, and at least the title area can be specified in each document information.
[0294] まず、ユーザがキーボードやマウス等を操作することにより、検索用語「形態素解析 」を技術動向情報作成装置 1001に入力したとする。すると、その検索用語「形態素 解析」は、検索用語受付部 1011で受け付けられ (ステップ S1101)、検索部 1013に 渡される。検索部 1013は、その検索用語「形態素解析」がタイトルの領域に含まれる 文献情報を検索する (ステップ S1102)。検索部 1013は、その検索の結果として、タ ィトルの領域に検索用語「形態素解析」が含まれる文献情報に対応する文献 ID「00 1, 003, 010· ··」を取得し、特定部 1014と、文献情報取得部 1015とに渡す。  First, it is assumed that the user inputs a search term “morpheme analysis” to the technology trend information creation apparatus 1001 by operating a keyboard, a mouse, or the like. Then, the search term “morpheme analysis” is received by the search term receiving unit 1011 (step S1101) and passed to the search unit 1013. The search unit 1013 searches for literature information in which the search term “morpheme analysis” is included in the title area (step S1102). As a result of the search, the search unit 1013 obtains a document ID “00 1, 003, 010...” Corresponding to the document information in which the search term “morphological analysis” is included in the title area, and the specifying unit 1014 And the document information acquisition unit 1015.
[0295] 特定部 1014は、検索部 1013から受け取った文献 IDで識別される文献情報と分 野が同一の文献情報を特定する (ステップ S 1103)。この具体例では、検索部 1013 力も受け取った文献 IDで識別される文献情報において、タイプ Cの引用形式で引用 されている文献情報を特定するものとする。その処理について具体的に説明する。  [0295] The identifying unit 1014 identifies document information having the same field as the document information identified by the document ID received from the search unit 1013 (step S1103). In this specific example, it is assumed that the document information cited in the type C citation format is specified in the document information identified by the document ID received by the search unit 1013. The process will be specifically described.
[0296] 特定部 1014は、まず、検索部 1013から受け取った 1番目の文献 ID「001」で識別 される文献情報から、引用文献識別情報と、引用情報とを取得する (ステップ S1201 , S1202)。ここで、文献 ID「001」で識別される文献情報が学術論文である場合に は、特定部 1014は、文献情報記憶部 1012から、その文献情報の「参考文献」の欄 に記載されている文献の書誌情報 (タイトルや発表年月、著者、掲載媒体等)を引用 文献識別情報として取得する。その書誌情報には、例えば、 1、 2、 3…や、 1)、 2)、 3 ) · ··、 [1]、 [2]、 [3]…ゝ [A]、 [B]、 [C]…等のインデックスが付されているので、そ のインデックスも取得する。次に、特定部 1014は、その文献情報において、取得した インデックスを用いた検索を行い、そのインデックスが含まれる文を特定する。そして 、そのインデックスの含まれる文と、その文と関連の深い文とを引用情報として取得す る。ここで、インデックスの含まれる文と関連の深い文を特定する方法について説明 する。 First, the identifying unit 1014 acquires cited document identification information and citation information from the document information identified by the first document ID “001” received from the search unit 1013 (steps S1201 and S1202). . Here, when the document information identified by the document ID “001” is an academic paper, the specifying unit 1014 reads the “reference document” column of the document information from the document information storage unit 1012. The bibliographic information (title, date of publication, author, publication medium, etc.) of the documents described in is acquired as cited reference identification information. The bibliographic information includes, for example, 1, 2, 3 ..., 1), 2), 3) ..., [1], [2], [3] ...] [A], [B], [ C] ... etc. are added, so that index is also acquired. Next, the specifying unit 1014 performs a search using the acquired index in the document information, and specifies a sentence including the index. Then, the sentence including the index and the sentence closely related to the sentence are acquired as citation information. Here, we explain how to identify sentences that are closely related to the sentence that contains the index.
[0297] 特定部 1014は、あら力じめ図示しない記録媒体において、下記の 2種類の手がか り語句を保持して 、るものとする。  [0297] The identification unit 1014 presumably holds the following two types of clue words on a recording medium (not shown).
(A)前拡張用手が力り語句  (A) Hands for front extension
In this, On tnis, Such, But, However- ··  In this, On tnis, Such, But, However-
(B)後拡張用手がかり語句  (B) Post expansion clue phrases
We, we, They, they, Our, our, In this, On tnis, Such, But, However- - - [0298] 特定部 1014は、まず、インデックスの含まれる文を引用情報の候補文とする。そし て、その候補文の先頭の文に (A)前拡張用手力 Sかり語句のいずれかが少なくとも含 まれる場合に、その候補文よりも 1個前の文を候補文に含める。特定部 1014は、候 補文を前に拡張する処理を、候補文の先頭の文に (A)前拡張用手がかり語句のす ベてが含まれなくなるまで繰り返す。また、特定部 1014は、候補文の次の文 (候補文 に後側に隣接する文であり、候補文には含まれない文)に、(B)後拡張用手が力り語 句が含まれる場合に、その候補文の次の文を候補文に含める。特定部 1014は、候 補文を後に拡張する処理を、候補文の次の文に (B)後拡張用手が力り語句のすべ てが含まれなくなるまで繰り返す。候補文を拡張する処理が終了した時点における候 補文が、引用情報となる。特定部 1014は、その引用情報を取得する。なお、( 前 拡張用手がかり語句、(B)後拡張用手がかり語句は一例であって、インデックスの含 まれる文と関連の深い文を適切に抽出することができるのであれば、それ以外の手が 力り語句を用いてもよい。また、ここでは、インデックスの含まれる文に注目したが、引 用文献識別情報によって直接、引用文献に言及している技術文献においては、引 用文献識別情報の含まれる文を、インデックスの含まれる文の代わりに用いることに なる。 We, we, They, they, Our, Our, In this, Ontnis, Such, But, However---[0298] First, the specifying unit 1014 sets a sentence including an index as a candidate sentence of citation information. Then, if the first sentence of the candidate sentence contains at least one of the (A) pre-extension manpower S words, the sentence one sentence before the candidate sentence is included in the candidate sentence. The identification unit 1014 repeats the process of expanding the candidate sentence before the candidate sentence does not include (A) all of the preceding extension clue words. In addition, the specifying unit 1014 adds (B) a post-extension hand to the next sentence after the candidate sentence (a sentence adjacent to the candidate sentence and not included in the candidate sentence). If included, the next sentence after the candidate sentence is included in the candidate sentence. The identification unit 1014 repeats the process of expanding the candidate sentence later until the sentence after the candidate sentence (B) does not include all of the phrasing words. The candidate sentence at the time when the process of expanding the candidate sentence is completed becomes the citation information. The identification unit 1014 acquires the citation information. Note that the (pre-extended cue word / phrase and (B) post-extended cue word / phrase are examples, and if the sentence containing the index is closely related to the sentence, the other hand can be extracted. In this example, we focused on the sentence that contains the index, but in the technical literature that directly refers to the cited document by the cited document identification information, The sentence containing the document identification information is used instead of the sentence containing the index.
[0299] 次に、特定部 1014は、取得した引用情報を用いて引用形式力タイプ Cであるかど うか判断する。ここで、特定部 1014は、図 35で示されるタイプ Cの手力かり語句と、 手が力り語句の位置とを対応付けた情報を図示しな 、記録媒体で保持して 、るもの とする。図 35の「手力かり語句の位置」とは、引用情報において、インデックスや引用 文献識別情報の含まれる文を基準 (0)とした、文の数を示す情報である。例えば、手 力 Sかり語句の位置「1〜3」の場合には、引用情報において、インデックスや引用文献 識別情報の含まれる文の次の文 (手力 Sかり語句の位置 = 1)から、インデックスや引用 文献識別情報の含まれる文力 3番目の後方の文 (手力かり語句の位置 = 3)までの 3個の文を意味している。手が力り語句の位置がマイナスである場合には、インデック スゃ引用文献識別情報の含まれる文よりも前の文を意味する。タイプ Cの手がかり語 句が、対応する手がかり語句の位置で示される文に含まれる場合に、その引用情報 の引用形式は、タイプ Cであると判断される。具体的には、引用情報において、イン デッタスや引用文献識別情報の含まれる文の次の文(1番目の文)から 3番目の文ま での 、ずれかの文に「However」が含まれる場合には、その引用情報の弓 I用形式は タイプ Cであると判断される(ステップ S1205)。  [0299] Next, the identifying unit 1014 determines whether or not the citation form power type C is obtained using the acquired citation information. Here, the specifying unit 1014 holds the information that associates the type C clue phrase shown in FIG. 35 with the position of the hand clue phrase on the recording medium, not shown. To do. The “position of clue phrase” in FIG. 35 is information indicating the number of sentences in the citation information, with the sentence including the index and the cited document identification information as a reference (0). For example, in the case of manual S scale phrase position “1-3”, in the citation information, from the sentence following the sentence that contains the index or cited document identification information (manual S scale phrase position = 1), This means three sentences up to the third sentence (index of clue phrase = 3). If the position of the hand phrase is negative, the index means the sentence before the sentence containing the cited document identification information. If a type C cue word is included in the sentence indicated by the position of the corresponding cue word, the citation form of the citation information is determined to be type C. Specifically, in the citation information, any sentence from the next sentence (first sentence) to the third sentence after the sentence containing the index or cited document identification information contains “However”. In this case, it is determined that the citation information format for the bow I is type C (step S1205).
[0300] 特定部 1014は、タイプ Cであると判断した引用形式の引用情報に対応する引用文 献識別情報を、一時的に記憶する (ステップ S 1206)。例えば、引用形式がタイプ C である引用情報に含まれるインデックスに対応する引用文献識別情報が一時的に記 憶されることになる。特定部 1014は、検索部 1013から受け取ったすべての文献 ID について、引用情報と引用文献識別情報の取得と、引用情報の引用形式がタイプ C である引用文献識別情報の一時記憶とが行われる (ステップ S1201〜S1209)。そ して、一時記憶された引用文献識別情報は、文献情報取得部 1015に渡される。  [0300] The identifying unit 1014 temporarily stores the cited document identification information corresponding to the cited information of the citation format determined to be type C (step S1206). For example, cited document identification information corresponding to an index included in citation information whose citation format is type C is temporarily stored. The identifying unit 1014 obtains citation information and cited document identification information for all document IDs received from the search unit 1013, and temporarily stores cited document identification information whose citation format is C type ( Steps S1201 to S1209). The temporarily stored cited document identification information is passed to the document information acquisition unit 1015.
[0301] 文献情報取得部 1015は、検索部 1013から受け取った文献 ID「001, 003, 010 · ··」で識別される文献情報のタイトルを、文献情報記憶部 1012から取得する (ステツ プ S1301, S1401〜S1405;>。また、文献情報取得部 1015は、特定部 1014力も 受け取った引用文献識別情報を用いて文献情報記憶部 1012を検索し、その引用 文献識別情報で識別される文献情報のタイトルが未取得である場合に、その文献情 報のタイ卜ルを取得する(ステップ S 1302, S1303, S1501〜S1506)。このようにし て、文献情報取得部 1015は、文献情報の一部を取得する (ステップ S 1104)。図 36 は、そのようにして取得された文献情報の一部(タイトル)の一例を示す図である。 [0301] The document information acquisition unit 1015 acquires the title of the document information identified by the document ID “001, 003, 010...” Received from the search unit 1013 from the document information storage unit 1012 (Step S1301). In addition, the document information acquisition unit 1015 searches the document information storage unit 1012 using the cited document identification information also received by the specifying unit 1014, and the citation is obtained. When the title of the document information identified by the document identification information has not been acquired, the title of the document information is acquired (steps S1302, S1303, S1501 to S1506). In this way, the document information acquisition unit 1015 acquires a part of the document information (step S 1104). FIG. 36 is a diagram showing an example of a part (title) of document information acquired in this way.
[0302] 次に、技術用語抽出部 1016による技術用語の抽出について説明する。ここでは、 ノ ターン 1の方法を用いて抽出する場合について説明する。また、図 36で示される 情報が、文献情報一時記憶手段 1020において記憶されているものとする。なお、手 力 Sかり語句記憶手段 1021では、図 37で示される手力かり語句が記憶されているもの とする。なお、図 37の手が力り語句において、「 $」は、抽出するべき技術用語の位 置を示す記号である。例えば、手が力り語句「による」が文献情報取得部 1015の取 得したタイトルに含まれる場合には、その手がカゝり語句「による」の前の連続した名詞 句を技術用語として抽出する。  Next, extraction of technical terms by the technical term extraction unit 1016 will be described. Here, the case where extraction is performed using the method of pattern 1 will be described. Further, it is assumed that the information shown in FIG. 36 is stored in the document information temporary storage means 1020. It is assumed that the clue word phrase storage means 1021 stores the clue word phrase shown in FIG. In FIG. 37, the “$” is a symbol indicating the position of a technical term to be extracted. For example, if the title is acquired by the literature information acquisition unit 1015, the hand is a vocabulary phrase “by”, and the continuous noun phrase before the word “by” is extracted as a technical term. To do.
[0303] 図 36で示される 1番目のタイトル「文脈木を用いた形態素解析」には、手がかり語句 「による」、「に基づく」が含まれないため、手力かり語句検索手段 1022は、 1番目のタ ィトルにはそれらの手力かり語句が含まれないと判断する(ステップ S1601, S1602 , S1606, S1607)。次に、手力かり語句検索手段 1022は、 1番目のタイトル「文脈 木を用いた形態素解析」において、手力かり語句「を用いた」を検索する。この場合に は、そのタイトルに「を用いた」が含まれるため(ステップ S 1602)、手力かり語句「を用 いた」が検索され、手力 Sかり語句検索手段 1022は、その手がかり語句「を用いた」の 前の連続した名詞句を抽出する旨を技術用語抽出手段 1023に渡す。すると、技術 用語抽出手段 1023は、タイトル「文脈木を用いた形態素解析」を形態素解析して、 手がかり語句「を用いた」の前の連続した名詞句「文脈木」を技術用語として抽出し( ステップ S1603)、その技術用語「文脈木」と、対応する文献 ID「001」とを図示しな い記録媒体に蓄積する。このようにして、技術用語の抽出が行われる (ステップ S110 5)。図 38は、そのようにして抽出された技術用語と文献 IDとを対応付ける情報の一 例を示す図である。なお、技術用語抽出部 1016は、図 38で示される情報のうち、文 献 IDを時期情報取得部 1017に渡すものとする。また、技術用語抽出部 1016は、図 38で示される情報を技術動向情報作成部 1018に渡すものとする。 [0304] 時期情報取得部 1017は、その文献 IDを受け取ると、その文献 IDを検索キーとして 文献情報記憶部 1012を検索し、その文献 IDに対応する時期情報を取得する (ステ ップ S 1106)。図 39は、そのようにして取得された時期情報と文献 IDとを対応付ける 情報の一例を示す図である。時期情報取得部 1017は、図 39で示される情報を技術 動向情報作成部 1018に渡すものとする。 [0303] The first title “morphological analysis using context tree” shown in FIG. 36 does not include the clue phrases “by” and “based on”. It is determined that these clue words are not included in the second title (steps S1601, S1602, S1606, S1607). Next, the clue phrase search means 1022 searches for the clue phrase “using” in the first title “morphological analysis using context tree”. In this case, since “it used” is included in the title (step S 1602), the clue phrase “used” is searched, and the clue phrase search means 1022 uses the clue phrase “ To the technical term extraction means 1023 to extract the continuous noun phrase before “using”. Then, the technical term extraction means 1023 performs morphological analysis on the title “morphological analysis using context tree”, and extracts a continuous noun phrase “context tree” before the clue word “using” as a technical term ( In step S1603), the technical term “context tree” and the corresponding document ID “001” are stored in a recording medium (not shown). In this way, technical terms are extracted (step S110 5). FIG. 38 is a diagram showing an example of information associating technical terms extracted in this way with document IDs. Note that the technical term extraction unit 1016 passes the document ID of the information shown in FIG. 38 to the time information acquisition unit 1017. The technical term extraction unit 1016 passes the information shown in FIG. 38 to the technical trend information creation unit 1018. [0304] Upon receiving the document ID, the time information acquisition unit 1017 searches the document information storage unit 1012 using the document ID as a search key, and acquires the time information corresponding to the document ID (step S 1106). ). FIG. 39 is a diagram showing an example of information associating the time information thus obtained with the document ID. The time information acquisition unit 1017 shall pass the information shown in FIG. 39 to the technology trend information creation unit 1018.
[0305] 技術動向情報作成部 1018は、技術用語抽出部 1016から図 38で示される情報を 受け取り、時期情報取得部 1017から図 39で示される情報を受け取ると、技術用語ご とに時期情報をグルーピングする。そして、各技術用語を縦に記載し、横軸に時期を 記載し、技術用語ごとに対応する時期に丸印を配置し、右端の丸印と、左端の丸印 とをつなぐ線を記載した画像を作成する (ステップ S 1107)。その画像には、図示しな い経路によって受け取った検索用語も記載するものとする。なお、技術動向情報の 画像を生成するために、技術動向情報作成部 1018は、テンプレート等を図示しない 記録媒体において保持しておき、そのテンプレート等を用いて技術動向情報の画像 を生成してもよい。  [0305] The technical trend information creation unit 1018 receives the information shown in Fig. 38 from the technical term extraction unit 1016, and receives the information shown in Fig. 39 from the timing information acquisition unit 1017, the timing information is obtained for each technical term. Group. Each technical term is written vertically, the time is written on the horizontal axis, a circle is placed at the corresponding time for each technical term, and a line connecting the rightmost circle and the leftmost circle is written. An image is created (step S 1107). The image shall also contain the search terms received via routes not shown. In order to generate an image of technology trend information, the technology trend information creation unit 1018 holds a template or the like in a recording medium (not shown) and generates an image of technology trend information using the template or the like. Good.
[0306] 出力部 1019は、技術動向情報作成部 1018が作成した技術動向情報の画像をデ イスプレイに表示する(ステップ S1108)。図 40は、そのようにしてディスプレイに表示 された技術動向情報の画像の一例を示す図である。このようにして、ユーザは、形態 素解析に関する技術動向(技術の流れ)を知ることができる。  [0306] The output unit 1019 displays an image of the technology trend information created by the technology trend information creation unit 1018 on the display (step S1108). FIG. 40 is a diagram showing an example of the image of the technical trend information displayed on the display as described above. In this way, the user can know the technical trend (technical flow) related to morphological analysis.
[0307] なお、この具体例では、技術動向情報が画像である場合にっ 、て説明したが、技 術動向情報は画像でなくてもよい。例えば、図 38で示される情報と、図 39で示される 情報とを文献 IDを用いて組み合わせた図 41で示される情報であってもよ 、。図 41で 示される技術動向情報は、例えば、スプレッドシートのデータであってもよい。  [0307] Although this specific example has been described in the case where the technical trend information is an image, the technical trend information may not be an image. For example, the information shown in FIG. 41 may be the information shown in FIG. 38 combined with the information shown in FIG. 39 using the document ID. The technology trend information shown in FIG. 41 may be spreadsheet data, for example.
[0308] また、この具体例において、パターン 2で説明したように、専門用語を用いて技術用 語の抽出を行ってもよい。例えば、専門用語記憶手段 1024において、図 42で示さ れる専門用語が記憶されており、手力 Sかり語句検索手段 1022が検索した手力 Sかり語 句と所定の関係を有する用語が「日本語コーノス」や「提案手法」であるとする。この 場合には、「日本語コーパス」に、専門用語記憶手段 1024で記憶されている「コーパ ス」が含まれるため、技術用語抽出手段 1025は、「コーパス」を技術用語として抽出 すること〖こなる。また、「提案手法」に、専門用語記憶手段 1024で記憶されている「コ 一パス」等が含まれないため、技術用語抽出手段 1025は、非専門用語である「提案 手法」を技術用語として抽出しないことになる。このようにすることで、技術用語抽出 手段 1025は、「日本語コーパス」を技術用語として抽出するのではなぐ技術用語と して適切な「コーパス」を抽出するようになる。また、技術用語抽出手段 1025が一般 名詞や非専門用語等を抽出する事態を回避することもできる。 [0308] In this specific example, as described in Pattern 2, technical terms may be extracted using technical terms. For example, the technical term storage means 1024 stores the technical terms shown in FIG. 42, and a term having a predetermined relationship with the manual skill S phrase searched by the manual skill S phrase search means 1022 is “Japanese. It is assumed that it is “Konos” or “Proposed method”. In this case, since the “Japanese corpus” includes the “corpus” stored in the technical term storage means 1024, the technical term extraction means 1025 extracts “corpus” as the technical term. To do something. In addition, since the “proposed method” does not include the “co-path” stored in the technical term storage means 1024, the technical term extracting means 1025 uses the “proposed method” that is a non-technical term as a technical term. It will not be extracted. By doing so, the technical term extraction means 1025 extracts an appropriate “corpus” as a technical term that does not extract “Japanese corpus” as a technical term. In addition, it is possible to avoid the situation where the technical term extraction means 1025 extracts general nouns and non-technical terms.
[0309] また、この具体例で用いた具体的な情報等は、本実施の形態による技術動向情報 作成装置 1001の動作の詳細を説明するために示したものであり、現実の学術論文 等を含むコーパス力 取得したものではない。  [0309] Further, the specific information used in this specific example is shown to explain the details of the operation of the technical trend information creation device 1001 according to the present embodiment. Including corpus power Not acquired.
[0310] 以上のように、本実施の形態による技術動向情報作成装置 1001では、文献情報 カゝら技術用語を抽出し、また、技術用語の抽出された文献情報に対応する時期情報 を取得し、抽出された技術用語と取得された時期情報とを用いて技術動向情報を作 成することができる。その結果、その作成された技術動向情報を用いることにより、あ る研究分野にお 、て用いられて 、る技術の変遷を容易に知ることができうる。例えば 、ある技術用語が技術文献に記載された時期の幅 ( 、つから!/、つまでその技術用語 が使用されているの力 )を知ることができる。また、例えば、ある技術用語が技術文献 に記載された頻度についても知ることができる。また、その技術動向情報を作成する ために、その研究分野に精通した人の多大な労力と時間を費やす必要がないという メリツ卜ちある。  [0310] As described above, the technical trend information creation apparatus 1001 according to the present embodiment extracts technical terms from the literature information, and acquires time information corresponding to the literature information from which the technical terms are extracted. The technology trend information can be created using the extracted technical terms and the acquired time information. As a result, by using the created technology trend information, it is possible to easily know the transition of the technology used in a certain research field. For example, it is possible to know the range of time when a technical term is described in the technical literature (from one to! /, The power that the technical term is used from). In addition, for example, it is possible to know the frequency with which a certain technical term is described in the technical literature. In addition, there is a merit that it is not necessary to spend a great deal of labor and time for those who are familiar with the research field to create the technology trend information.
[0311] なお、上記実施の形態では、文献情報取得部 1015が、検索部 1013が検索した 文献情報の全部または一部を取得する場合について説明したが、そうでなくてもよい 。例えば、文献情報取得部 1015は、文献情報記憶部 1012で記憶されている複数 の文献情報のうち、少なくとも一部の文献情報の全部または一部を取得するものであ つてもよい。例えば、ある研究分野の文献情報のみが文献情報記憶部 1012で記憶 されている場合には、文献情報取得部 1015は、その文献情報記憶部 1012で記憶 されている文献情報の全部、または、ランダムに選択した所定の個数の文献を取得し てもよい。このように、検索部 1013による検索が不要な場合には、技術動向情報作 成装置 1001は、検索用語受付部 1011や、検索部 1013を備えなくてもよい。 [0312] また、上記実施の形態では、検索部 1013によって検索された文献情報と分野が同 一の文献情報を特定部 1014が特定し、文献情報取得部 1015が特定部 1014の特 定した文献情報の全部または一部も取得する場合について説明したが、そうでなくて もよい。すなわち、文献情報取得部 1015は、特定部 1014の特定した文献情報の取 得を行わなくてもよい。この場合には、技術動向情報作成装置 1001は、特定部 101 4を備えなくてもよい。 [0311] In the above embodiment, a case has been described in which the document information acquisition unit 1015 acquires all or part of the document information searched by the search unit 1013. However, this need not be the case. For example, the document information acquisition unit 1015 may acquire all or a part of at least some document information among the plurality of document information stored in the document information storage unit 1012. For example, when only the literature information of a certain research field is stored in the literature information storage unit 1012, the literature information acquisition unit 1015 may select all of the literature information stored in the literature information storage unit 1012 or randomly. A predetermined number of documents selected in the above may be acquired. As described above, when the search by the search unit 1013 is unnecessary, the technology trend information creation device 1001 does not need to include the search term reception unit 1011 and the search unit 1013. [0312] Also, in the above embodiment, the specifying unit 1014 specifies the document information having the same field as the document information searched by the searching unit 1013, and the document information acquiring unit 1015 specifies the document specified by the specifying unit 1014. Although the case where all or a part of the information is acquired has been described, this need not be the case. That is, the document information acquisition unit 1015 may not acquire the document information specified by the specifying unit 1014. In this case, the technology trend information creation device 1001 may not include the specifying unit 1014.
また、特定部 1014による文献情報の特定で引用文献識別情報や引用情報を用い ない場合には、文献情報に引用文献識別情報や引用情報が含まれなくてもよい。  In addition, when the document information is specified by the specifying unit 1014 and the cited document identification information or the cited information is not used, the document information may not include the cited document identification information or the cited information.
[0313] (実施の形態 5)  [0313] (Embodiment 5)
本発明の実施の形態 5による技術動向情報作成装置について、図面を参照しなが ら説明する。本実施の形態による技術動向情報作成装置では、検索用語受付部の 受け付ける検索用語が選択されるものである。  A technical trend information creation device according to Embodiment 5 of the present invention will be described with reference to the drawings. In the technical trend information creation device according to the present embodiment, the search terms accepted by the search term acceptance unit are selected.
[0314] 図 43は、本実施の形態による技術動向情報作成装置 1002の構成を示すブロック 図である。図 43において、本実施の形態による技術動向情報作成装置 1002は、検 索用語受付部 1011と、文献情報記憶部 1012と、検索部 1013と、特定部 1014と、 文献情報取得部 1015と、技術用語抽出部 1016と、時期情報取得部 1017と、技術 動向情報作成部 1018と、出力部 1019と、接尾語記憶部 1031と、専門用語抽出部 1032と、検索用語蓄積部 1033と、検索用語記憶部 1034と、検索用語選択受付部 1035とを備える。本実施の形態において新たに追加された接尾語記憶部 1031、専 門用語抽出部 1032、検索用語蓄積部 1033、検索用語記憶部 1034、検索用語選 択受付部 1035以外の構成及び動作は、検索用語受付部 1011が、検索用語選択 受付部 1035が受け付けた選択で示される検索用語を受け付ける以外、実施の形態 4と同様であり、その説明を省略する。  FIG. 43 is a block diagram showing a configuration of the technical trend information creation device 1002 according to the present embodiment. In FIG. 43, the technology trend information creation device 1002 according to the present embodiment includes a search term acceptance unit 1011, a document information storage unit 1012, a search unit 1013, a specification unit 1014, a document information acquisition unit 1015, a technology Term extraction unit 1016, time information acquisition unit 1017, technology trend information creation unit 1018, output unit 1019, suffix storage unit 1031, technical term extraction unit 1032, search term storage unit 1033, and search term storage A section 1034 and a search term selection receiving section 1035 are provided. The configuration and operation other than the suffix storage unit 1031, the specialized term extraction unit 1032, the search term storage unit 1033, the search term storage unit 1034, and the search term selection reception unit 1035 newly added in the present embodiment are searched. The term accepting unit 1011 is the same as in the fourth embodiment except that it accepts the search term indicated by the selection accepted by the search term selection accepting unit 1035, and the description thereof is omitted.
[0315] 接尾語記憶部 1031では、所定の接尾語が記憶される。この接尾語は、後述する専 門用語抽出部 1032が抽出した専門用語から、検索用語を選ぶときに用いられるも のであり、例えば、「システム」、「手法」、「データ」等である。なお、それ以外の適切な 接尾語を用いてもょ ヽことは言うまでもな ヽ。接尾語記憶部 1031で記憶される接尾 語は、実際には接尾語を示す情報であるが、単に接尾語と呼ぶことにする。また、接 尾語記憶部 1031は、所定の記録媒体 (例えば、半導体メモリや磁気ディスク、光ディ スクなど)によって実現されうる。また、接尾語記憶部 1031に接尾語が記憶される過 程は問わない。例えば、記録媒体を介して接尾語が接尾語記憶部 1031で記憶され るようになってもよぐ通信回線等を介して送信された接尾語が接尾語記憶部 1031 で記憶されるようになってもよぐあるいは、入力デバイスを介して入力された接尾語 が接尾語記憶部 1031で記憶されるようになってもよい。 [0315] The suffix storage unit 1031 stores a predetermined suffix. This suffix is used when selecting a search term from the technical terms extracted by the professional term extraction unit 1032 described later, and is, for example, “system”, “method”, “data”, and the like. It goes without saying that other appropriate suffixes may be used. The suffix stored in the suffix storage unit 1031 is actually information indicating a suffix, but is simply referred to as a suffix. Also, contact The acronym storage unit 1031 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, etc.). Further, the process of storing the suffix in the suffix storage unit 1031 does not matter. For example, a suffix transmitted via a communication line or the like may be stored in the suffix storage unit 1031 even if the suffix is stored in the suffix storage unit 1031 via a recording medium. Alternatively, the suffix input via the input device may be stored in the suffix storage unit 1031.
[0316] 専門用語抽出部 1032は、文献情報記憶部 1012で記憶されている文献情報から 専門用語を抽出する。ここで、専門用語とは、 1以上の研究分野における専門的な用 語である。この専門用語は、実際には専門用語を示す情報であるが、単に専門用語 と呼ぶことにする。専門用語抽出部 1032は、例えば、文献情報を形態素解析するこ とにより、単名詞等を抽出し、各単名詞等について専門用語である可能性を示すスコ ァ付けを行 、、高 、スコアを付与された単名詞等を専門用語として取得してもよ 、。 ここで、スコアを付与する方法として、造語能力に基づくスコア付け、出現頻度に基づ くスコア付け等の複数の種類が知られている。また、それ以外の方法を用いてもよい[0316] The technical term extraction unit 1032 extracts technical terms from the literature information stored in the literature information storage unit 1012. Here, the terminology is a technical term in one or more research fields. This terminology is actually information indicating a terminology, but is simply called a terminology. The technical term extraction unit 1032 extracts single nouns, for example, by performing morphological analysis on the literature information, performs scoring to indicate the possibility of being a technical term for each single noun, etc. You can get a single noun and other words as technical terms. Here, as a method for assigning a score, a plurality of types such as scoring based on word-making ability and scoring based on appearance frequency are known. Other methods may be used.
。専門用語を抽出する方法については、実施の形態 4で示した複数の文献等におい て開示されており、従来力も知られているため、その詳細な説明を省略する。 . A method for extracting technical terms is disclosed in a plurality of documents shown in the fourth embodiment and the like, and its conventional power is also known.
[0317] 検索用語蓄積部 1033は、専門用語抽出部 1032が抽出した専門用語に、接尾語 記憶部 1031で記憶されて 、る接尾語を付けた用語が、文献情報記憶部 1012で記 憶されている文献情報に含まれる場合に、その専門用語を検索用語として検索用語 記憶部 1034に蓄積する。接尾語記憶部 1031で記憶される接尾語を適切に選択す ることにより、技術動向を作成する分野を決めるために用いられる検索用語を適切に 蓄積することができるようになる。例えば、接尾語「システム」が接尾語記憶部 1031で 記憶されており、専門用語抽出部 1032が「形態素」と、「形態素解析」とを抽出した 場合に、「形態素システム」が文献情報に含まれず、「形態素解析システム」が文献情 報に含まれるとすると、検索用語として「形態素解析」が蓄積されるが、「形態素」は蓄 積されないことになる。 [0317] The search term storage unit 1033 stores a term in which the technical term extracted by the technical term extraction unit 1032 is stored in the suffix storage unit 1031 and a suffix is added in the document information storage unit 1012. The technical term is stored in the search term storage unit 1034 as a search term. By appropriately selecting the suffix stored in the suffix storage unit 1031, it becomes possible to appropriately accumulate the search terms used to determine the field for creating the technical trend. For example, when the suffix “system” is stored in the suffix storage unit 1031 and the technical term extraction unit 1032 extracts “morpheme” and “morpheme analysis”, the “morpheme system” is included in the literature information. If the “morpheme analysis system” is included in the document information, “morpheme analysis” is stored as a search term, but “morpheme” is not stored.
[0318] 検索用語記憶部 1034では、 2以上の検索用語が記憶される。この検索用語は、検 索用語蓄積部 1033によって蓄積されたものである。検索用語記憶部 1034は、所定 の記録媒体 (例えば、半導体メモリや磁気ディスク、光ディスクなど)によって実現され うる。 [0318] The search term storage unit 1034 stores two or more search terms. This search term is stored by the search term storage unit 1033. The search term storage unit 1034 Recording medium (for example, semiconductor memory, magnetic disk, optical disk, etc.).
[0319] 検索用語選択受付部 1035は、検索用語記憶部 1034で記憶されている 2以上の 検索用語のうち、いずれかの検索用語の選択を受け付ける。この選択用語の受け付 けのために、検索用語記憶部 1034で記憶されている 2以上の検索用語が出力され てもよく、あるいは、されなくてもよい。前者の場合には、例えば、検索用語がディスプ レイ等に表示され、ユーザがマウス等を操作することにより所望の検索用語をクリック 等し、その操作を検索用語選択受付部 1035が受け付けてもよい。後者の場合には 、例えば、検索用語記憶部 1034で記憶されている 2以上の検索用語と、各検索用 語に対応した識別記号との表示された書類等をユーザが有しており、ユーザがキー ボード等を操作することにより所望の検索用語に対応した識別記号を入力し、その入 力を検索用語選択受付部 1035が受け付けてもよい。なお、検索用語受付部 1011 は、前述のように、検索用語選択受付部 1035が受け付けた選択で示される検索用 語を受け付ける。  [0319] The search term selection accepting unit 1035 accepts selection of one of the two or more search terms stored in the search term storage unit 1034. In order to accept this selected term, two or more search terms stored in the search term storage unit 1034 may or may not be output. In the former case, for example, a search term may be displayed on a display or the like, and the user may click a desired search term by operating a mouse or the like, and the search term selection receiving unit 1035 may receive the operation. . In the latter case, for example, the user has a document or the like on which two or more search terms stored in the search term storage unit 1034 and an identification symbol corresponding to each search term are displayed. The user may input an identification symbol corresponding to a desired search term by operating a keyboard or the like, and the search term selection receiving unit 1035 may receive the input. The search term receiving unit 1011 receives the search term indicated by the selection received by the search term selection receiving unit 1035 as described above.
[0320] なお、検索用語選択受付部 1035による受け付けは、例えば、入力デバイス (例え ば、キーボードやマウス、タツチパネルなど)から入力された情報の受け付けでもよぐ 有線もしくは無線の通信回線を介して送信された情報の受信でもよい。なお、検索用 語選択受付部 1035は、受け付けを行うためのデバイス (例えば、モデムやネットヮー クカードなど)を含んでもよぐあるいは含まなくてもよい。また、検索用語選択受付部 1035は、ハードウェアによって実現されてもよぐあるいは所定のデバイスを駆動す るドライバ等のソフトウェアによって実現されてもょ 、。  [0320] The search term selection accepting unit 1035 accepts information input from an input device (for example, a keyboard, mouse, touch panel, etc.), for example, and is transmitted via a wired or wireless communication line. Received information may be received. Note that the search word selection receiving unit 1035 may or may not include a device (for example, a modem or a network card) for receiving. The search term selection accepting unit 1035 may be realized by hardware or software such as a driver for driving a predetermined device.
[0321] 次に、本実施の形態による技術動向情報作成装置 1002における技術動向情報を 作成する動作は、検索用語の受付処理 (ステップ S 1101)以外、図 17のフローチヤ ートと同様であり、その説明を省略する。図 44は、図 17のフローチャートにおける検 索用語の受付処理 (ステップ S1101)に代わる処理を示すフローチャートである。す なわち、本実施の形態による技術動向情報作成装置 1002では、ステップ S1101の 処理の代わりに、図 44で示されるステップ S 11201〜S11203の処理を行えばよ!/、。  [0321] Next, the operation for creating the technology trend information in the technology trend information creation device 1002 according to the present embodiment is the same as the flowchart of FIG. 17 except for the search term acceptance process (step S1101). The description is omitted. FIG. 44 is a flowchart showing a process replacing the search term acceptance process (step S1101) in the flowchart of FIG. That is, in the technology trend information creation device 1002 according to the present embodiment, the processing of steps S 11201 to S11203 shown in FIG. 44 may be performed instead of the processing of step S1101! /.
[0322] (ステップ S11201)検索用語選択受付部 1035は、検索用語の選択を受け付けた 力どうか判断する。そして、選択を受け付けた場合には、ステップ S 11202に進み、 そうでない場合には、選択を受け付けるまで、ステップ S11201の処理を繰り返す。 [0322] (Step S11201) Search term selection accepting unit 1035 accepts selection of a search term Judge whether it is power. If the selection is accepted, the process proceeds to step S11202. If not, the process of step S11201 is repeated until the selection is accepted.
[0323] (ステップ S11202)検索用語選択受付部 1035は、検索用語記憶部 1034で記憶 されている 2以上の検索用語から、検索用語選択受付部 1035が受け付けた選択で 示される検索用語を読み出す。 (Step S11202) Search term selection receiving unit 1035 reads a search term indicated by the selection received by search term selection receiving unit 1035 from two or more search terms stored in search term storage unit 1034.
[0324] (ステップ S11203)検索用語選択受付部 1035は、ステップ S 11202で検索用語 記憶部 1034から読み出した検索用語を検索用語受付部 1011に入力する。その結 果、その検索用語が検索用語受付部 1011で受け付けられることになる。そして、ス テツプ S1102に進む。 [0324] (Step S11203) The search term selection receiving unit 1035 inputs the search term read from the search term storage unit 1034 in Step S11202 to the search term receiving unit 1011. As a result, the search term is accepted by the search term acceptance unit 1011. Then go to step S1102.
[0325] なお、図 44のフローチャートでは、検索用語記憶部 1034で記憶されている検索用 語を検索用語選択受付部 1035が読み出して、検索用語受付部 1011に入力する場 合について説明したが、検索用語受付部 1011が、検索用語選択受付部 1035で受 け付けられた選択の示す検索用語を検索用語記憶部 1034から読み出すようにして ちょい。  [0325] In the flowchart of Fig. 44, the case where the search term selection receiving unit 1035 reads the search term stored in the search term storage unit 1034 and inputs it to the search term reception unit 1011 has been described. The search term reception unit 1011 reads the search term indicated by the selection received by the search term selection reception unit 1035 from the search term storage unit 1034.
[0326] 次に、検索用語記憶部 1034に検索用語が蓄積される処理について、図 45のフロ 一チャートを用いて説明する。  Next, processing for storing search terms in the search term storage unit 1034 will be described with reference to the flowchart of FIG.
(ステップ S11301)専門用語抽出部 1032は、文献情報記憶部 1012で記憶され ている文献情報から、専門用語を抽出する。この専門用語を抽出する方法は、前述 の通りであり、その詳細な説明を省略する。なお、抽出された専門用語は、図示しな (Step S11301) The technical term extraction unit 1032 extracts technical terms from the literature information stored in the literature information storage unit 1012. The method for extracting this technical term is as described above, and a detailed description thereof is omitted. Note that the extracted technical terms are not shown.
V、記録媒体にぉ 、て一時記憶されてもよ!、。 V, may be temporarily stored in the recording medium!
[0327] (ステップ S11302)検索用語蓄積部 1033は、カウンタ iを 1に設定し、カウンタ jを 1 に設定する。 (Step S11302) Search term storage section 1033 sets counter i to 1 and counter j to 1.
(ステップ S11303)検索用語蓄積部 1033は、専門用語抽出部 1032が抽出した i 番目の専門用語に、接尾語記憶部 1031で記憶されている j番目の接尾語を付けた 用語を文献情報記憶部 1012において検索し、その用語が文献情報記憶部 1012で 記憶されている文献情報に含まれるかどうか判断する。そして、含まれる場合には、 ステップ S11304【こ進み、そうでな ヽ場合【こ ίま、ステップ S 11307【こ進む。  (Step S11303) The search term storage unit 1033 adds a term obtained by adding the jth suffix stored in the suffix storage unit 1031 to the i-th technical term extracted by the technical term extraction unit 1032 and the literature information storage unit. A search is performed in 1012 to determine whether the term is included in the document information stored in the document information storage unit 1012. If it is included, go to step S11304. If not, go to step S11307.
[0328] (ステップ S11304)検索用語蓄積部 1033は、 i番目の専門用語を検索用語として 、検索用語記憶部 1034に蓄積する。 [0328] (Step S11304) The search term storage unit 1033 uses the i-th technical term as a search term. And stored in the search term storage unit 1034.
(ステップ S11305)検索用語蓄積部 1033は、カウンタ iを 1だけインクリメントし、力 ゥンタ jを 1に設定する。  (Step S11305) The search term storage unit 1033 increments the counter i by 1, and sets the force counter j to 1.
[0329] (ステップ S11306)検索用語蓄積部 1033は、専門用語抽出部 1032が抽出した i 番目の専門用語が存在するかどうか判断する。そして、存在する場合には、ステップ S 11303に戻り、そうでない場合には、検索用語を蓄積する一連の処理は終了となる  (Step S11306) The search term accumulation unit 1033 determines whether or not the i-th term is extracted by the technical term extraction unit 1032. If it exists, the process returns to step S 11303; otherwise, the series of processes for accumulating search terms ends.
[0330] (ステップ S11307)検索用語蓄積部 1033は、カウンタ jを 1だけインクリメントする。 (Step S11307) The search term storage unit 1033 increments the counter j by 1.
(ステップ S11308)検索用語蓄積部 1033は、接尾語記憶部 1031において、 j番 目の接尾語が記憶されているかどうか判断する。そして、記憶されている場合には、 ステップ S 11303【こ戻り、そうでな ヽ場合【こ ίま、ステップ S 11305【こ進む。  (Step S11308) The search term storage unit 1033 determines whether or not the j-th suffix is stored in the suffix storage unit 1031. If it is stored, go back to step S11303, and if not, go to step S11305.
[0331] 以上のように、本実施の形態による技術動向情報作成装置 1002では、専門用語 を抽出し、その専門用語から検索用語を選択して蓄積することにより、その専門用語 の抽出や検索用語の選択等が適切に行われることによって、研究分野を示す適切な 検索用語が検索用語受付部 1011で受け付けられるようにすることができうる。その 結果、適切な技術動向情報を作成することができうる。  [0331] As described above, the technical trend information creation apparatus 1002 according to the present embodiment extracts technical terms, selects search terms from the technical terms, and accumulates them to extract the technical terms and search terms. Appropriate search terms indicating the research field can be received by the search term receiving unit 1011 by appropriately selecting the items. As a result, appropriate technology trend information can be created.
[0332] なお、本実施の形態では、検索用語記憶部 1034で記憶されて 、る検索用語を検 索用語蓄積部 1033が蓄積する場合について説明したが、検索用語記憶部 1034で 記憶される検索用語は、他の方法によって蓄積されてもよい。例えば、ユーザ等が適 切な検索用語を選択等して、その検索用語が検索用語記憶部 1034で記憶されるよ うにしてもよい。そのようにして、検索用語が検索用語記憶部 1034に記憶される過程 は問わない。例えば、記録媒体を介して検索用語が検索用語記憶部 1034で記憶さ れるようになってもよぐ通信回線等を介して送信された検索用語が検索用語記憶部 1034で記憶されるようになってもよぐあるいは、入力デバイスを介して入力された検 索用語が検索用語記憶部 1034で記憶されるようになってもよい。このような場合に は、技術動向情報作成装置 1002は、接尾語記憶部 1031、専門用語抽出部 1032 、検索用語蓄積部 1033を備えなくてもよい。  [0332] In the present embodiment, the search term storage unit 1034 stores the search terms stored in the search term storage unit 1034. However, the search term storage unit 1034 stores the search terms. Terminology may be accumulated by other methods. For example, the user or the like may select an appropriate search term and the search term may be stored in the search term storage unit 1034. As such, the process in which the search term is stored in the search term storage unit 1034 is not limited. For example, a search term transmitted via a communication line or the like may be stored in the search term storage unit 1034 even if the search term is stored in the search term storage unit 1034 via a recording medium. Alternatively, the search term input via the input device may be stored in the search term storage unit 1034. In such a case, the technology trend information creation device 1002 may not include the suffix storage unit 1031, the technical term extraction unit 1032, and the search term storage unit 1033.
[0333] (実施の形態 6) 本発明の実施の形態 6による技術動向情報作成装置について、図面を参照しなが ら説明する。本実施の形態による技術動向情報作成装置は、表示された技術動向 情報の画像において、文献情報に対応する文献図形の選択に応じて、その文献情 報に関する情報が表示されるものである。 [0333] (Embodiment 6) A technical trend information creation device according to Embodiment 6 of the present invention will be described with reference to the drawings. The technical trend information creating apparatus according to the present embodiment displays information regarding the document information in the displayed image of the technical trend information according to the selection of the document graphic corresponding to the document information.
[0334] 図 46は、本実施の形態による技術動向情報作成装置 1003の構成を示すブロック 図である。図 46において、本実施の形態による技術動向情報作成装置 1003は、検 索用語受付部 1011と、文献情報記憶部 1012と、検索部 1013と、特定部 1014と、 文献情報取得部 1015と、技術用語抽出部 1016と、時期情報取得部 1017と、技術 動向情報作成部 1018と、出力部 1019と、図形選択受付部 1041と、表示情報取得 部 1042とを備える。本実施の形態において新たに追加された図形選択受付部 104 1、表示情報取得部 1042以外の構成及び動作は、技術動向情報作成部 1018が表 示情報取得部 1042の取得した表示情報を表示する技術動向情報を作成する以外 、実施の形態 4と同様であり、その説明を省略する。  FIG. 46 is a block diagram showing a configuration of the technical trend information creation device 1003 according to this embodiment. In FIG. 46, the technology trend information creation device 1003 according to the present embodiment includes a search term acceptance unit 1011, a document information storage unit 1012, a search unit 1013, a specification unit 1014, a document information acquisition unit 1015, a technology A term extraction unit 1016, a time information acquisition unit 1017, a technology trend information creation unit 1018, an output unit 1019, a graphic selection reception unit 1041, and a display information acquisition unit 1042 are provided. In the configuration and operation other than the figure selection reception unit 1041 and the display information acquisition unit 1042 newly added in the present embodiment, the technology trend information creation unit 1018 displays the display information acquired by the display information acquisition unit 1042. Except for creating the technology trend information, this is the same as in the fourth embodiment, and the explanation is omitted.
[0335] なお、本実施の形態において、技術動向情報は、技術用語ごとに、その技術用語 を含む文献情報に対応する図形である文献図形が、その文献情報に対応する時期 情報の示す時期に応じて表示される画像であるものとする。文献図形の形状は問わ ない。文献図形は、例えば、図 40で示される技術動向情報のように、丸印であっても よぐ三角形、四角形等であってもよぐあるいは、その他の形状であってもよい。  [0335] In the present embodiment, for each technical term, the technical trend information is a time when the document graphic corresponding to the document information including the technical term is indicated by the timing information corresponding to the document information. It is assumed that the image is displayed accordingly. The shape of the document figure is not important. For example, as shown in the technical trend information shown in FIG. 40, the document figure may be a circle, a triangle, a rectangle, or the like, or another shape.
[0336] 図形選択受付部 1041は、技術動向情報における文献図形の選択を受け付ける。  [0336] The figure selection receiving unit 1041 receives selection of a document figure in the technology trend information.
この選択の受け付けは、例えば、ユーザがマウス等を操作することによってポインタを 文献図形の上に持ってくる(オンマウスォーノ)などの GUI (Graphical User Inte rface)を用いた選択の受け付けであってもよぐあるいは、技術動向情報において、 各文献図形に対して、その文献図形を識別する識別情報 (例えば、番号やアルファ ベットなど)が対応付けられており、ユーザがキーボード等を操作することによってそ の識別情報を入力するなどの選択の受け付けであってもよい。  This selection may be accepted, for example, using a GUI (Graphical User Interface) such as bringing the pointer over a reference figure by operating the mouse or the like (on mouse). Alternatively, in the technical trend information, each document graphic is associated with identification information (for example, a number or alphabet) for identifying the document graphic, and the user can operate the keyboard by operating the keyboard. Selection may be accepted such as inputting identification information.
[0337] なお、図形選択受付部 1041による受け付けは、例えば、入力デバイス (例えば、キ 一ボードやマウス、タツチパネルなど)力も入力された情報の受け付けでもよぐ有線 もしくは無線の通信回線を介して送信された情報の受信でもよい。なお、図形選択受 付部 1041は、受け付けを行うためのデバイス(例えば、モデムやネットワークカードな ど)を含んでもよぐあるいは含まなくてもよい。また、図形選択受付部 1041は、ハー ドウエアによって実現されてもよぐあるいは所定のデバイスを駆動するドライバ等のソ フトウェアによって実現されてもよい。 [0337] The graphic selection accepting unit 1041 accepts, for example, an input device (eg, a keyboard, a mouse, a touch panel, etc.) and transmits it via a wired or wireless communication line that accepts input information. Received information may be received. In addition, figure selection reception The attachment unit 1041 may or may not include a device (for example, a modem or a network card) for reception. The graphic selection receiving unit 1041 may be realized by hardware, or may be realized by software such as a driver that drives a predetermined device.
[0338] 表示情報取得部 1042は、表示情報を文献情報記憶部 1012から取得する。ここで 、表示情報とは、図形選択受付部 1041が選択を受け付けた文献図形に対応する文 献情報に関する情報であり、技術動向情報の画像に表示する情報である。文献情報 に関する情報とは、例えば、文献情報の書誌情報である。より具体的には、文献情報 のタイトルや、著者、文献の発表年等であってもよい。なお、技術動向情報作成部 10 18は、表示情報取得部 1042が取得した表示情報を表示する技術動向情報を作成 する。その表示情報は、例えば、図形選択受付部 1041が選択を受け付けた文献図 形の近くに表示されてもよぐあるいは、表示情報を表示する領域が設定されている 場合には、その領域に表示されてもよい。  [0338] Display information acquisition section 1042 acquires display information from document information storage section 1012. Here, the display information is information related to the document information corresponding to the document graphic accepted by the graphic selection receiving unit 1041, and is information displayed on the image of the technical trend information. Information related to document information is, for example, bibliographic information of document information. More specifically, it may be the title of the document information, the author, the publication year of the document, or the like. The technical trend information creation unit 1018 creates technical trend information for displaying the display information acquired by the display information acquisition unit 1042. For example, the display information may be displayed in the vicinity of the reference graphic that the graphic selection receiving unit 1041 has received the selection, or if the display information display area is set, the display information is displayed in that area. May be.
[0339] 次に、本実施の形態による技術動向情報作成装置 1003における技術動向情報を 作成する動作は、図 17のフローチャートと同様であり、その説明を省略する。図 47は 、図 17のフローチャートの技術動向情報の出力処理 (ステップ S 1108)の処理の後 に実行する処理を示す図である。すなわち、本実施の形態による技術動向情報作成 装置 1003では、ステップ S 1108の処理の後に、図 47で示されるステップ S11401 〜S 11406の処理を行えばよ!ヽ。  Next, the operation of creating technical trend information in technical trend information creating apparatus 1003 according to the present embodiment is the same as the flowchart of FIG. 17, and the description thereof is omitted. FIG. 47 is a diagram showing a process executed after the process of the technical trend information output process (step S 1108) in the flowchart of FIG. That is, in the technology trend information creation device 1003 according to the present embodiment, the processing in steps S11401 to S11406 shown in FIG. 47 may be performed after the processing in step S1108.
[0340] (ステップ S11401)図形選択受付部 1041は、文献図形の選択を受け付けたかどう か判断する。そして、文献図形の選択を受け付けた場合には、ステップ S11402に進 み、そうでな ヽ場合に ίま、ステップ S 11405に進む。  [0340] (Step S11401) The figure selection receiving unit 1041 determines whether or not the selection of a document figure has been received. If the selection of the document figure is accepted, the process proceeds to step S11402, and if not, the process proceeds to step S11405.
[0341] (ステップ S11402)表示情報取得部 1042は、図形選択受付部 1041が選択を受 け付けた文献図形に関する情報である表示情報を文献情報記憶部 1012から取得 する。  [0341] (Step S11402) The display information acquisition unit 1042 acquires display information, which is information related to the reference graphic accepted by the graphic selection receiving unit 1041, from the reference information storage unit 1012.
[0342] (ステップ S11403)技術動向情報作成部 1018は、表示情報取得部 1042が取得 した表示情報を含む技術動向情報を作成する。  (Step S11403) The technology trend information creation unit 1018 creates technology trend information including the display information acquired by the display information acquisition unit 1042.
(ステップ S11404)出力部 1019は、技術動向情報作成部 1018が作成した、表示 情報を含む技術動向情報を出力する。そして、ステップ S11401に戻る。 (Step S11404) The output unit 1019 is a display created by the technology trend information creation unit 1018. Outputs technical trend information including information. Then, the process returns to step S11401.
[0343] (ステップ S11405)出力部 1019は、技術動向情報の出力を終了するかどうか判断 する。例えば、技術動向情報を出力して力 所定の時間が経過したり、技術動向情 報の出力を終了する旨の指示を受け付けたり、他の情報を出力する旨の指示を受け 付けたりした場合に、出力部 1019は、技術動向情報の出力を終了すると判断しても よい。技術動向情報の出力を終了する場合には、ステップ S11406に進み、そうでな い場合には、ステップ S 11401に戻る。 (Step S11405) The output unit 1019 determines whether or not to end the output of the technology trend information. For example, when the technical trend information is output and a predetermined time elapses, an instruction to end the output of the technical trend information is received, or an instruction to output other information is received The output unit 1019 may determine to end the output of the technology trend information. If the output of the technology trend information is to end, the process proceeds to step S11406. If not, the process returns to step S11401.
[0344] (ステップ S11406)出力部 1019は、技術動向情報の出力を終了する。そして、ス テツプ S1101に戻る。 (Step S11406) The output unit 1019 ends the output of the technology trend information. Then, the process returns to step S1101.
[0345] 次に、本実施の形態による技術動向情報作成装置 1003の表示情報を表示する動 作について、具体的に説明する。ここで、図 40で示されるように、ディスプレイに技術 動向情報の画像が表示されているものとする。その表示において、コスト最小法に対 応した文献図形 (丸印)のうち、一番左側の丸印の上にマウスポインタを置いたとする 。すると、図形選択受付部 1041は、文献図形の選択を受け付けたと判断する (ステツ プ S11401)。そして、図形選択受付部 1041は、そのマウスポインタの位置の文献図 形に対応する文献情報を識別する文献 ID「003」を技術動向情報作成部 1018にァ クセスして取得し、表示情報取得部 1042に渡す。  [0345] Next, the operation of displaying the display information of the technology trend information creation device 1003 according to this embodiment will be specifically described. Here, as shown in FIG. 40, it is assumed that an image of the technology trend information is displayed on the display. Suppose that the mouse pointer is placed on the leftmost circle in the display (circle) corresponding to the minimum cost method. Then, the figure selection receiving unit 1041 determines that the selection of the document figure has been received (step S11401). Then, the graphic selection receiving unit 1041 accesses the technical trend information creating unit 1018 to acquire the document ID “003” for identifying the document information corresponding to the document pattern at the position of the mouse pointer, and obtains the display information acquiring unit. Pass to 1042.
[0346] 表示情報取得部 1042は、図形選択受付部 1041から受け取った文献 ID「003」で 識別される文献情報のタイトルを、表示情報として文献情報記憶部 1012から取得し (ステップ S11402)、その表示情報を技術動向情報作成部 1018に渡す。技術動向 情報作成部 1018は、表示情報取得部 1042から受け取った表示情報を、その時点 におけるマウスポインタの位置に対応付けて表示するように技術動向情報を作成す る(ステップ S 11403)。なお、技術動向情報作成部 1018は、マウスポインタの位置 を図形選択受付部 1041から受け取ってもよぐ他の構成要素力も受け取ってもよい  The display information acquisition unit 1042 acquires the title of the document information identified by the document ID “003” received from the figure selection reception unit 1041 as display information from the document information storage unit 1012 (step S11402). The display information is passed to the technical trend information creation unit 1018. The technology trend information creation unit 1018 creates the technology trend information so that the display information received from the display information acquisition unit 1042 is displayed in association with the position of the mouse pointer at that time (step S 11403). Note that the technology trend information creation unit 1018 may receive the position of the mouse pointer from the figure selection reception unit 1041 or other component force.
[0347] 出力部 1019は、そのようにして作成された表示情報を含む技術動向情報を出力 する。その結果、図 48で示される技術動向情報がディスプレイに表示されることにな る(ステップ S 11404)。 [0348] なお、ユーザが技術動向情報の画像のウィンドウを閉じる動作を行った場合には、 出力部 1019は、技術動向情報の出力を終了すると判断し (ステップ S 11405)、技 術動向情報の出力が終了されることになる (ステップ S11406)。 [0347] The output unit 1019 outputs the technology trend information including the display information created as described above. As a result, the technology trend information shown in FIG. 48 is displayed on the display (step S 11404). [0348] When the user performs an operation to close the image of the technology trend information, the output unit 1019 determines that the output of the technology trend information is finished (step S11405), and the The output is ended (step S11406).
[0349] この具体例では、表示情報として、文献情報のタイトルが表示される場合につ!、て 説明したが、前述のように、それ以外の表示情報が表示されてもよい。例えば、著者 名や、発表時期等が表示情報に含まれてもよい。発表時期は、時期情報であっても よい。  [0349] In this specific example, the case where the title of the document information is displayed as the display information has been described. However, as described above, other display information may be displayed. For example, the display information may include the author's name and the announcement time. The announcement time may be time information.
[0350] 以上のように、本実施の形態による技術動向情報作成装置 1003では、技術動向 情報の画像における文献図形が選択された場合に、その文献図形に対応する文献 情報に関する情報である表示情報が取得され、その表示情報を含む技術動向情報 の画像が作成されて出力される。したがって、例えば、技術動向情報の画像に表示 されている技術用語の含まれる文献情報についての所定の情報を、その技術動向 情報の画像を見て 、るユーザが容易に知ることができるようになり、ユーザの利便性 が向上されている。  [0350] As described above, in the technical trend information creation device 1003 according to the present embodiment, when a document graphic in the image of the technical trend information is selected, display information that is information related to the document information corresponding to the document graphic Is acquired, and an image of technical trend information including the display information is created and output. Therefore, for example, it becomes possible for a user to easily know predetermined information about document information including technical terms displayed in the technical trend information image by looking at the technical trend information image. User convenience has been improved.
[0351] ここで、技術用語を抽出する方法の改良と、実験結果とについて説明する。実施の 形態 4において、技術用語を抽出する各種の方法について説明したが、ここでは、機 械学習を取り入れることなどにより、その抽出精度を向上させた方法 (手法 1と手法 2) について説明する。  [0351] Here, an improvement in a method for extracting technical terms and experimental results will be described. In the fourth embodiment, various methods for extracting technical terms have been described. Here, methods (Method 1 and Method 2) that improve the extraction accuracy by incorporating machine learning will be described.
[0352] まず、技術用語抽出部 1016は、文献情報取得部 1015によって取得された文献情 報の少なくとも一部に対して、前述の説明と同様にして、「Method」タグや、「Goal」 タグを付与する。また、「における」や「に関する」等の限定に関する手力かり語句に対 応する「ReStrict」タグや、「及び」や「または」等の接続に関する手がカゝり語句に対応 する「Conj」タグ等も付与する。なお、「Method」タグや、「Goal」タグ、後述する「He ad」タグ以外のタグのことを「Other」タグと呼ぶことにする。ここでは、文献情報取得 部 1015が技術文献のタイトルを文献情報の一部として取得する場合について説明 する。また、今回は、文献情報の主題を示す「Head」タグも付与する。「Head」タグは 、「Method」タグや「Goal」タグの付与されていない名詞句に対して付与される。多く の場合に、取得されたタイトルの末尾の名詞句に対して「Head」タグが付与されるこ とになる。 [0352] First, the technical term extraction unit 1016 applies the “Method” tag and the “Goal” tag to at least a part of the document information acquired by the document information acquisition unit 1015 in the same manner as described above. Is granted. In addition, “Re S trict” tags that correspond to clue words related to limitations such as “in” and “related”, and “and”, “or”, and other connections related to the corresponding words “ Add “Conj” tag and so on. Note that tags other than the “Method” tag, “Goal” tag, and “Head” tag described later are referred to as “Other” tags. Here, the case where the document information acquisition unit 1015 acquires the title of the technical document as part of the document information will be described. In addition, this time, a “Head” tag indicating the subject of the document information is also added. The “Head” tag is attached to a noun phrase to which no “Method” tag or “Goal” tag is attached. In many cases, a “Head” tag is added to the noun phrase at the end of the acquired title. It becomes.
[0353] ここで、 「Head」タグ等の付与されたタイトルを用いて技術用語を抽出する方法に ついて簡単に説明する。技術用語抽出部 1016は、検索用語受付部 1011が受け付 けた検索用語に「Head」タグが付与されて 、るタイトルから、「Method」タグの付与さ れている用語を技術用語として抽出する。また、技術用語抽出部 1016は、検索用語 受付部 1011が受け付けた検索用語に「Goal」タグが付与されて 、るタイトルから、「 Head]タグの付与されて 、る用語を技術用語として抽出する。このようにすることで、 検索用語に対する要素技術を示す技術用語を抽出することができる。  [0353] Here, a method for extracting technical terms using a title such as a "Head" tag will be briefly described. The technical term extraction unit 1016 extracts a term having a “Method” tag as a technical term from a title to which the “Head” tag is added to the search term received by the search term receiving unit 1011. In addition, the technical term extraction unit 1016 extracts a term that is given a “Heal” tag from a title that is assigned a “Goal” tag to the search term accepted by the search term acceptance unit 1011 as a technical term. By doing so, it is possible to extract technical terms indicating elemental technologies for the search terms.
[0354] し力しながら、文献情報取得部 1015によって取得されたタイトルの末尾の名詞句 に「Head」タグを付与することが不適切な場合もある。例えば、そのタイトル力 ^中国 語形態素解析に対する SVMとコスト最小法の比較実験」である場合に、タイトルの末 尾である「比較実験」は主題ではなぐ「SVM」や「コスト最小法」が主題であると考え られる。したがって、そのような場合であっても、適切に「Head」タグを付与することが できる方法 (手法 1と手法 2)について説明する。  [0354] However, it may be inappropriate to add a "Head" tag to the noun phrase at the end of the title acquired by the literature information acquisition unit 1015. For example, if the title power is a comparison experiment between SVM and the least cost method for Chinese morphological analysis, then the last comparison title is not the subject but SVM or the least cost method. It is thought that. Therefore, we will explain the methods (Method 1 and Method 2) that can add the “Head” tag appropriately even in such a case.
[0355] まず、手法 1について説明する。手法 1では、主題として不適切な用語 (例えば、「 研究」「検討」「開発」「影響」「解析」「一考察」「評価」「考察」「実験的研究」「一検討」 「応用」「効果」「提案」等である)のリスト (ここでは、「不要語リスト」と呼ぶ)を用意する 。この不要語リストを作成する方法については後述する。まず、技術用語抽出部 101 6は、文献情報取得部 1015によって取得されたタイトルに対して、形式的に「Head」 タグや「Method」タグ等を付与していく。そして、「Head」タグの付与された用語が不 要語リストに含まれるかどうか判断する。「Head」タグの付与された用語が不要語リス トに含まれない場合には、「Head」タグを付与する処理は終了となる。一方、「Head」 タグの付与された用語が不要語リストに含まれる場合には、その用語への ¾^(1」タ グの付与を取りやめて、その用語よりも前に存在する名詞句であって、「Method」タ グゃ「Goal」タグの付与されて 、な 、名詞句に対して「Head」タグを付与する。この処 理を、「Head」タグを付与した用語が不要語リストに含まれなくなるまで繰り返す。こ のようにして、適切な「Head」タグの付与が行われることになる。なお、ここでは、日本 語に対する処理の場合について説明したが、英語等の日本語とは文法的に語順が 違う言語の場合には、「Head」タグの付与された用語が不要語リストに含まれる際に 、その用語よりも後に存在する名詞句に新たに「Head」タグを付与することになる。 [0355] First, Method 1 will be described. In Method 1, terms that are inappropriate as the subject (for example, “research” “examination” “development” “impact” “analysis” “consideration” “evaluation” “consideration” “experimental research” “examination” “application” A list of “effects”, “suggestions”, etc.) (herein called “unnecessary word list”) is prepared. A method of creating this unnecessary word list will be described later. First, the technical term extraction unit 1016 adds a “Head” tag, a “Method” tag, and the like to the title acquired by the document information acquisition unit 1015. Then, it is determined whether the term with the “Head” tag is included in the unnecessary word list. If the term to which the “Head” tag is attached is not included in the unnecessary word list, the process of assigning the “Head” tag ends. On the other hand, if a term with the “Head” tag is included in the unnecessary word list, the ¾ ^ (1) tag is removed from the term, and a noun phrase existing before the term is deleted. If the “Method” tag is added with the “Goal” tag, the “Head” tag is added to the noun phrase, and the term with the “Head” tag is added to the unnecessary word list. In this way, an appropriate “Head” tag will be assigned.In this example, the processing for Japanese is explained, but it is not possible to Is grammatically word order In the case of a different language, when a term to which the “Head” tag is added is included in the unnecessary word list, a “Head” tag is newly added to a noun phrase existing after the term.
[0356] 次に、手法 2について説明する。手法 2では、適切な「Head」タグの付与が行われ たデータ (教師データ)を用意し、機械学習を行わせる。この場合に、「Head」タグ以 外の「Method」タグや「Goal」タグ等についても、同様に機械学習を行わせる。また 、その教師データにおいて、各語句が不要語リストに含まれているかどうかの情報も 含ませることにより、不要語リストに含まれる用語に対して「Head」タグが付与されるこ とがなくなると考えられる。なお、機械学習のツールとしては、例えば、 SVM (サポー ト ·ベクター ·マシン)等を用いてもょ 、。このような機械学習を行わせることによって、 不適切な「Head」タグの付与が行われなくなると考えられる。 [0356] Next, method 2 will be described. In Method 2, data (teacher data) with appropriate “Head” tags is prepared and machine learning is performed. In this case, machine learning is similarly performed for “Method” tags and “Goal” tags other than the “Head” tag. In addition, by including information on whether or not each word is included in the unnecessary word list in the teacher data, the “Head” tag is not added to the term included in the unnecessary word list. Conceivable. For example, SVM (support vector machine) may be used as a machine learning tool. By performing such machine learning, it is considered that inappropriate “Head” tags are not assigned.
[0357] 次に、上記の手法 1で用いられた不要語リストを作成する方法について説明する。 Next, a method for creating an unnecessary word list used in Method 1 above will be described.
形式的に「Head」タグ等の付与を行った場合には、「実験」や「一考察」等の不適切 な用語に「Head」タグが付与されると共に、「形態素解析」等の適切な用語にも「Hea d」タグが付与されることになる。ただし、「形態素解析」等の用語に「Head」タグが付 与されるのは自然言語処理分野の論文だけであるが、「実験」や「一考察」等の用語 に「Head」タグが付与されるのは、広範な技術分野にわたると考えられる。したがって 、形式的に「Head」タグを付与した後に、その「Head」タグの付与された用語を頻度 順にソートすると、「実験」や「一考察」等の不要語が上位にランクされることになる。そ して、上位にランクされる不要語から、技術用語を人手によって除外することによって 、不要語リストを作成することができる。  When the “Head” tag is added formally, the “Head” tag is added to inappropriate terms such as “Experiment” and “Consideration”, and the appropriate “Morphological analysis” is added. The term “Hea d” tag is also attached to the term. However, only the papers in the natural language processing field are given the “Head” tag for terms such as “morphological analysis”, but the “Head” tag is given to terms such as “experiment” and “one consideration”. It is believed that it will cover a wide range of technical fields. Therefore, after formally assigning the “Head” tag and sorting the terms with the “Head” tag in order of frequency, unnecessary words such as “experiment” and “one consideration” are ranked higher. Become. An unnecessary word list can be created by manually excluding technical terms from unnecessary words ranked higher.
[0358] 最後に、形式的に「Head」タグ等を付与した場合と、手法 1を用いた場合と、手法 2 を用いた場合の実験結果について説明する。この実験では、 NTCIRワークショップ 1 , 2言語横断検索タスクのデータを用いた。このデータは、 1988〜1997年の抄録デ ータベースであり、日本国内 65学会の発表論文を約 45万件含んでいる。これらのデ 一タカも無作為に抽出した日本語論文表題 1000件に対して「Head」タグ等の付与 を人手で行い、それを教師データとして用いることにより、実験を行った。 Finally, the experimental results when the “Head” tag and the like are formally added, when Method 1 is used, and when Method 2 is used will be described. In this experiment, we used data from NTCIR Workshop 1 and 2 cross-language search task. This data is an abstract database from 1988 to 1997, and contains about 450,000 papers presented by 65 academic societies in Japan. We conducted experiments by manually assigning “Head” tags to 1000 Japanese paper titles that were randomly selected and using them as teacher data.
[0359] 図 49は、形式的にタグを付与したときの結果であり、図 50は、手法 1を用いたときの 結果であり、図 51は、手法 2を用いたときの結果である。なお、それらの結果におい て、 Precisionは、精度 (適合率)を示す値であり、正確性の指標となるものである。次 の式によって求められる。 FIG. 49 shows the results when tags are formally added, FIG. 50 shows the results when Method 1 is used, and FIG. 51 shows the results when Method 2 is used. In addition, in those results Precision is a value indicating accuracy (accuracy rate) and serves as an index of accuracy. It is calculated by the following formula.
Precision =システムの出力と正解のタグが一致した形態素数/システムがタグを 付与した形態素数  Precision = number of morphemes whose system output matches the correct tag / number of morphemes with system tag
[0360] また、 Recallは、再現率を示す値であり、網羅性の指標となるものである。次の式に よって求められる。  [0360] Recall is a value indicating the recall, and serves as an index of completeness. It can be calculated by the following formula.
Recall =システムの出力と正解のタグが一致した形態素数 Z人間の被験者力 sタグ を付与した形態素数 Recall = number of morphemes whose system output matches correct tag Z number of morphemes with human subject power s tag
[0361] また、 F値は、 Precisionと Recallとの調和平均であり、次の式によって求められる。  [0361] The F value is a harmonic average of Precision and Recall, and is obtained by the following equation.
F値 = 2水 Recall水 Precision/ (Recall+ Precision;  F value = 2 Recall water Precision / (Recall + Precision;
[0362] 図 49〜図 51の実験結果からわ力るように、形式的なタグの付与を行うと、例えば「 Head」タグの抽出精度は 44%程度と、非常に低い値になっている。一方、手法 1, 2 による結果(図 50,図 51)では、「Head」タグの抽出精度と再現率が 30%以上向上 していることがわかる。このことから、手法 1、手法 2の有効性が確認できた。したがつ て、この手法 1, 2を用いて、文献情報取得部 1015が取得した文献情報の少なくとも 一部に対してタグの付与を行うことによって、より高精度な技術用語の抽出を行うこと 力でさること〖こなる。  [0362] As can be seen from the experimental results in Figs. 49 to 51, when a formal tag is added, for example, the extraction accuracy of the "Head" tag is very low, about 44%. . On the other hand, the results of Methods 1 and 2 (Figs. 50 and 51) show that the extraction accuracy and recall of the “Head” tag have improved by 30% or more. From this, the effectiveness of Method 1 and Method 2 was confirmed. Therefore, it is possible to extract technical terms with higher accuracy by applying tags to at least a part of the document information acquired by the document information acquisition unit 1015 using these methods 1 and 2. It's a bit tricky to do with power.
[0363] ここで、技術動向情報の表示に関する改良について説明する。実施の形態 4等に おいて、技術動向情報について説明したが、多数の技術用語が技術動向情報に含 まれる場合には、技術動向情報がユーザにとってわ力りにくいものとなりうる。したが つて、ここでは、分類ごとに技術用語をまとめて技術動向情報を作成する方法につい て説明する。  [0363] Here, the improvement related to the display of the technology trend information will be described. In the fourth embodiment, etc., the technology trend information has been described. However, when a large number of technical terms are included in the technology trend information, the technology trend information may be difficult for the user. Therefore, this section explains how to create technical trend information by collecting technical terms for each classification.
[0364] 図 52は、分類ごとに技術用語をまとめた技術動向情報を作成する技術動向情報作 成装置 1004の構成を示すブロック図である。図 52において、技術動向情報作成装 置 1004は、分類結果情報記憶部 1051をさらに備え、その分類結果情報記憶部 10 51で記憶されている分類結果情報を用いて技術動向情報を作成する以外、実施の 形態 4の技術動向情報作成装置 1001と同様のものである。  [0364] FIG. 52 is a block diagram showing a configuration of a technology trend information creation device 1004 that creates technology trend information in which technical terms are grouped for each classification. In FIG. 52, the technology trend information creation device 1004 further includes a classification result information storage unit 1051, except that the technology trend information is created using the classification result information stored in the classification result information storage unit 10 51, This is the same as the technology trend information creation device 1001 of the fourth embodiment.
[0365] 分類結果情報記憶部 1051では、分類結果情報が記憶される。ここで、分類結果 情報とは、専門用語が分類された結果を示す情報である。この専門用語は、技術用 語抽出部 1016が抽出する可能性のある技術用語である。例えば、図 53で示される ように、分類結果情報では、専門用語が分類ごとにまとめられている。図 53では、そ の分類に「アルゴリズム」等の分類名が付与されて 、るが、分類名は付与されて!、な くてもよい。また、この分類結果情報は、人手によって作成されてもよぐあるいは、自 動的に作成されてもよい。 [0365] In the classification result information storage unit 1051, the classification result information is stored. Where classification result Information is information indicating the result of classification of technical terms. This technical term is a technical term that the technical term extraction unit 1016 may extract. For example, as shown in Fig. 53, in the classification result information, technical terms are grouped by classification. In FIG. 53, a classification name such as “algorithm” is given to the classification, but the classification name is given! The classification result information may be created manually or automatically.
[0366] 分類結果情報記憶部 1051は、所定の記録媒体 (例えば、半導体メモリや磁気ディ スク、光ディスクなど)によって実現されうる。分類結果情報記憶部 1051に分類結果 情報が記憶される過程は問わない。例えば、記録媒体を介して分類結果情報が分 類結果情報記憶部 1051で記憶されるようになってもよぐ通信回線等を介して送信 された分類結果情報が分類結果情報記憶部 1051で記憶されるようになってもよぐ あるいは、入力デバイスを介して入力された分類結果情報が分類結果情報記憶部 1 051で記憶されるようになってもよ!/、。  [0366] The classification result information storage unit 1051 can be realized by a predetermined recording medium (for example, a semiconductor memory, a magnetic disk, an optical disk, or the like). The process of storing the classification result information in the classification result information storage unit 1051 is not limited. For example, classification result information transmitted via a communication line or the like may be stored in the classification result information storage unit 1051 even if the classification result information is stored in the classification result information storage unit 1051 via the recording medium. Alternatively, the classification result information input via the input device may be stored in the classification result information storage unit 1 051! /.
[0367] 技術動向情報作成部 1018は、分類結果情報記憶部 1051で記憶されている分類 結果情報を用いて、技術用語が分類された技術動向情報を作成する。技術用語が 分類されている以外は、技術動向情報は、実施の形態 4等と同様である。具体的に は、技術動向情報作成部 1018は、分類結果情報を参照し、技術用語抽出部 1016 が抽出した技術用語を分類ごとに分ける。例えば、技術用語「コスト最小法」「ニュー ラルネット」「HMM」「タグ付きコーパス」「タグなしコーノ ス」が抽出された場合には、 技術動向情報作成部 1018は、図 53で示される分類結果情報を参照し、それらの技 術用語を分類「アルゴリズム」と、分類「データ」とに分ける。そのように、分類わけのな された技術用語のグループごとに、技術動向情報作成部 1018は、前述の説明と同 様にして、技術動向情報が作成する。図 54は、そのようにして作成された技術動向 情報の一例を示す図である。図 54において、技術用語が分類ごとに分かれているた め、ユーザは、技術動向を分類ごとに容易に把握することができる。特に、抽出され る技術用語が多数である場合に有効である。なお、実施の形態 4等と同様にして技 術動向情報を作成した後に、分類結果情報を用いて技術用語の分類わけを行って ちょい。 [0368] また、上記各実施の形態にお!、て、各処理または各機能は、単一の装置または単 一のシステムによって集中処理されることによって実現されてもよぐあるいは、複数 の装置または複数のシステムによって分散処理されることによって実現されてもよい。 [0367] The technical trend information creation unit 1018 creates technical trend information in which technical terms are classified using the classification result information stored in the classification result information storage unit 1051. The technical trend information is the same as in the fourth embodiment, etc., except that technical terms are classified. Specifically, the technology trend information creation unit 1018 refers to the classification result information, and classifies the technical terms extracted by the technical term extraction unit 1016 for each classification. For example, when the technical terms “minimum cost”, “neural net”, “HMM”, “tagged corpus”, and “untagged cornos” are extracted, the technical trend information creation unit 1018 displays the classification results shown in FIG. By referring to the information, these technical terms are divided into the classification “algorithm” and the classification “data”. As such, for each group of technical terms that have been classified, the technical trend information creation unit 1018 creates technical trend information in the same manner as described above. FIG. 54 is a diagram showing an example of the technology trend information created as described above. In FIG. 54, since the technical terms are classified for each category, the user can easily grasp the technical trend for each category. This is particularly effective when a large number of technical terms are extracted. After creating technical trend information in the same way as in Embodiment 4, etc., classify technical terms using classification result information. [0368] In each of the above embodiments, each process or each function may be realized by centralized processing by a single device or a single system, or a plurality of devices. Alternatively, it may be realized by distributed processing by a plurality of systems.
[0369] また、上記各実施の形態において、関連用語取得装置や技術動向情報作成装置 は、スタンドアロンの装置であってもよぐあるいは、サーバ'クライアントシステムを構 成するサーバ装置であってもよい。後者の場合には、関連用語取得装置は、クライア ントから送信された用語情報を受信してもよぐまた、クライアントに関連用語情報を 送信してちょい。  [0369] Also, in each of the above embodiments, the related term acquisition device and the technical trend information creation device may be a stand-alone device or may be a server device constituting a server 'client system. . In the latter case, the related term acquisition device may receive the term information transmitted from the client, or transmit the related term information to the client.
[0370] また、上記各実施の形態において、各構成要素は専用のハードウェアにより構成さ れてもよぐあるいは、ソフトウェアにより実現可能な構成要素については、プログラム を実行することによって実現されてもよい。例えば、ハードディスクや半導体メモリ等 の記録媒体に記録されたソフトウェア 'プログラムを CPU等のプログラム実行部が読 み出して実行することによって、各構成要素が実現され得る。  [0370] Also, in each of the above embodiments, each component may be configured by dedicated hardware, or a component that can be realized by software may be realized by executing a program. Good. For example, each component can be realized by a program execution unit such as a CPU reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
[0371] なお、上記実施の形態 1〜3における関連用語取得装置を実現するソフトウエアは 、以下のようなプログラムである。つまり、このプログラムは、コンピュータに、用語を示 す情報である用語情報を受け付ける用語情報受付ステップと、前記用語情報受付ス テツプで受け付けた用語情報の示す用語を文献情報の全部または一部に有する文 献情報を、文献を示す情報である文献情報が 2以上記憶される文献情報記憶部で 記憶されている 2以上の文献情報から検索し、検索された文献情報の全部、または 当該検索された文献情報の一部であって、当該検索された文献情報において引用 して ヽる他の文献を識別する情報である弓 I用文献識別情報を少なくとも有する部分 を前記文献情報記憶部から取得する文献情報取得ステップと、前記文献情報取得ス テツプで取得した文献情報の全部または一部に含まれる弓 I用文献識別情報で識別 される文献を示す情報である弓 I用文献情報を前記文献情報記憶部で記憶されて 、 る 2以上の文献情報力 検索し、検索された引用文献情報の一部であって、あらかじ め決められている項目に対応する部分を前記文献情報記憶部から取得する引用文 献情報取得ステップと、前記引用文献情報取得ステップで取得した引用文献情報の 一部から、前記用語情報受付ステップが受け付けた用語情報の示す用語に関連す る関連用語を示す情報である関連用語情報を取得する関連用語情報取得ステップ と、前記関連用語情報取得ステップで取得した関連用語情報を出力する関連用語 情報出力ステップと、を実行させるためのものである。 [0371] The software that realizes the related term acquisition apparatus according to Embodiments 1 to 3 described above is the following program. That is, this program has a term information receiving step for receiving term information, which is information indicating terms, in a computer, and a term indicated by the term information received in the term information receiving step in all or part of the document information. The document information is searched from two or more document information stored in the document information storage unit in which two or more document information that is information indicating the document is stored, and all or all of the searched document information is searched. A document which is a part of the document information and which has at least a document identification information for bow I which is information for identifying other documents cited in the retrieved document information from the document information storage unit Information indicating a document identified by the document identification information for bow I included in all or part of the document information acquired in the information acquisition step and the document information acquisition step. The document information for the bow I is stored in the document information storage unit and is searched for two or more document information, and corresponds to a predetermined item that is a part of the retrieved cited document information. Related to the term indicated by the term information accepted by the term information accepting step from a part of the cited reference information obtained in the cited reference information obtaining step and the cited reference information obtaining step obtained from the reference information storage unit. You A related term information acquisition step for acquiring related term information, which is information indicating related terms, and a related term information output step for outputting related term information acquired in the related term information acquisition step. is there.
[0372] また、上記実施の形態における技術動向情報作成装置を実現するソフトウェアは、 以下のようなプログラムである。つまり、このプログラムは、コンピュータに、技術文献 の少なくとも一部を示す情報である複数の文献情報と、当該文献情報の示す技術文 献に関する時期を示す情報である時期情報とが少なくとも記憶される文献情報記憶 部で記憶されている複数の文献情報のうち、少なくとも一部の文献情報の全部また は一部を取得する文献情報取得ステップと、前記文献情報取得ステップで取得した 文献情報の全部または一部力 技術用語を抽出する技術用語抽出ステップと、前記 技術用語抽出ステップで抽出した技術用語が含まれる文献情報に対応する時期情 報を前記文献情報記憶部から取得する時期情報取得ステップと、前記技術用語抽 出ステップで抽出した技術用語と、前記時期情報取得ステップで取得した時期情報 とを用いて、技術用語と当該技術用語を含む技術文献に関する時期とを対応付ける 情報である技術動向情報を作成する技術動向情報作成ステップと、前記技術動向 情報作成ステップで作成した技術動向情報を出力する出力ステップと、を実行させる ためのものである。  [0372] The software that realizes the technical trend information creation device in the above embodiment is the following program. In other words, this program stores, in a computer, a plurality of document information that is information indicating at least a part of the technical document, and a document that stores at least time information that is information indicating a time related to the technical document indicated by the document information Of the plurality of document information stored in the information storage unit, at least a part of the document information acquisition step for acquiring all or a part of the document information, and all or one of the document information acquired at the document information acquisition step. The technical term extracting step for extracting the technical term, the timing information acquiring step for acquiring the timing information corresponding to the document information including the technical term extracted in the technical term extracting step, from the document information storage unit, Using the technical terms extracted in the technical term extraction step and the time information acquired in the time information acquisition step, the technical terms and And executing a technology trend information creating step for creating technology trend information, which is information associating a period related to a technical document including the technical term, and an output step for outputting the technology trend information created in the technology trend information creating step. Is for.
[0373] また、このプログラムにおいて、コンピュータに、検索用語を受け付ける検索用語受 付ステップと、前記検索用語受付ステップで受け付けた検索用語を用いて、前記文 献情報記憶部を検索する検索ステップと、をさらに実行させ、前記文献情報取得ステ ップでは、検索された文献情報の全部または一部を取得するようにしてもょ 、。  [0373] Further, in this program, a search term acceptance step for accepting a search term in a computer, a search step for searching the literature information storage unit using the search term accepted in the search term acceptance step, The document information acquisition step may acquire all or part of the retrieved document information.
[0374] なお、上記プログラムにお 、て、情報を受け付ける受付ステップや、情報を出力す る出力ステップなどでは、ハードウェアでし力行われない処理、例えば、出力ステップ におけるモデムやインターフェースカードなどで行われる処理は少なくとも含まれな い。  [0374] In the above program, in the reception step for receiving information and the output step for outputting information, processing that is not performed by hardware, such as a modem or an interface card in the output step, is performed. Processing is not included at least.
[0375] また、このプログラムは、サーバなど力 ダウンロードされることによって実行されても よぐ所定の記録媒体 (例えば、 CD— ROMなどの光ディスクや磁気ディスク、半導 体メモリなど)に記録されたプログラムが読み出されることによって実行されてもよい。 [0376] また、このプログラムを実行するコンピュータは、単数であってもよぐ複数であって もよい。すなわち、集中処理を行ってもよぐあるいは分散処理を行ってもよい。 [0375] In addition, this program is recorded on a predetermined recording medium (for example, an optical disk such as a CD-ROM, a magnetic disk, a semiconductor memory, etc.) that can be executed by being downloaded by force such as a server. The program may be executed by being read. [0376] Further, the computer that executes this program may be singular or plural. That is, centralized processing or distributed processing may be performed.
[0377] 図 55は、上記プログラムを実行して、上記実施の形態による関連用語取得装置や 技術動向情報作成装置を実現するコンピュータの外観の一例を示す模式図である。 上記実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータ プログラムによって実現される。  FIG. 55 is a schematic diagram showing an example of the appearance of a computer that executes the program and realizes the related term acquisition device and the technical trend information creation device according to the embodiment. The above-described embodiment is realized by computer hardware and a computer program executed on the computer hardware.
[0378] 図 55において、コンピュータシステム 100は、 CD— ROM (Compact Disk Rea d Only Memory)ドライブ 105、 FD (Flexible Disk)ドライブ 106を含むコンビュ ータ 101と、キーボード 102と、マウス 103と、モニタ 104とを備える。  In FIG. 55, the computer system 100 includes a computer 101 including a CD-ROM (Compact Disk Read Only Memory) drive 105, an FD (Flexible Disk) drive 106, a keyboard 102, a mouse 103, and a monitor. 104.
[0379] 図 56は、コンピュータシステムを示す図である。図 56において、コンピュータ 101は 、 CD— ROMドライブ 105、 FDドライブ 106に加えて、 CPU (Central Processing Unit) 111と、ブートアッププログラム等のプログラムを記憶するための ROM (Rea d Only Memory) 112と、 CPU111に接続され、アプリケーションプログラムの命 令を一時的に記憶すると共に、一時記憶空間を提供する RAM (Random Access Memory) 113と、アプリケーションプログラム、システムプログラム、及びデータを 記憶するハードディスク 114と、 CPU111、 ROM112等を相互に接続するバス 115 とを備える。なお、コンピュータ 101は、 LANへの接続を提供する図示しないネットヮ ークカードを含んで 、てもよ 、。  FIG. 56 is a diagram showing a computer system. In FIG. 56, a computer 101 includes a CD-ROM drive 105 and an FD drive 106, a CPU (Central Processing Unit) 111, and a ROM (Read Only Memory) 112 for storing a program such as a bootup program. The CPU 111 is connected to the CPU 111 to temporarily store the instructions of the application program, and to provide a temporary storage space. And a bus 115 for mutually connecting the ROM 112 and the like. Note that the computer 101 may include a network card (not shown) that provides connection to the LAN.
[0380] コンピュータシステム 100に、上記実施の形態による関連用語取得装置や技術動 向情報作成装置の機能を実行させるプログラムは、 CD— ROM121、または FD122 に記憶されて、 CD— ROMドライブ 105、または FDドライブ 106に挿入され、ハード ディスク 114に転送されてもよい。これに代えて、そのプログラムは、図示しないネット ワークを介してコンピュータ 101に送信され、ハードディスク 114に記憶されてもょ 、。 プログラムは実行の際に RAMI 13にロードされる。なお、プログラムは、 CD— ROM 121や FD122、またはネットワークから直接、ロードされてもよい。  [0380] A program that causes the computer system 100 to execute the functions of the related term acquisition device and the technical direction information creation device according to the above-described embodiment is stored in the CD-ROM 121 or FD 122, and the CD-ROM drive 105, or It may be inserted into the FD drive 106 and transferred to the hard disk 114. Instead, the program may be transmitted to the computer 101 via a network (not shown) and stored in the hard disk 114. The program is loaded into RAMI 13 when executed. The program may be loaded directly from the CD-ROM 121, the FD 122, or the network.
[0381] プログラムは、コンピュータ 101に、上記実施の形態による関連用語取得装置や技 術動向情報作成装置の機能を実行させるオペレーティングシステム (OS)、またはサ ードパーティプログラム等を必ずしも含んでいなくてもよい。プログラムは、制御された 態様で適切な機能 (モジュール)を呼び出し、所望の結果が得られるようにする命令 の部分のみを含んで 、てもよ 、。コンピュータシステム 100がどのように動作するのか については周知であり、詳細な説明は省略する。 [0381] The program does not necessarily include an operating system (OS) or a third-party program that causes the computer 101 to execute the functions of the related term acquisition device and the technical trend information creation device according to the above-described embodiment. May be. Program controlled May contain only the part of the instruction that calls the appropriate function (module) in the aspect and achieves the desired result. How the computer system 100 operates is well known and will not be described in detail.
[0382] また、本発明は、以上の実施の形態に限定されることなぐ種々の変更が可能であ り、それらも本発明の範囲内に包含されるものであることは言うまでもない。  [0382] Further, the present invention can be variously modified without being limited to the above embodiments, and it goes without saying that these are also included in the scope of the present invention.
産業上の利用可能性  Industrial applicability
[0383] 以上のように、本発明による関連用語取得装置等によれば、受け付けられた用語 情報に関連する関連用語情報を取得することができ、ある用語に関連する関連用語 を取得する装置等として有用である。 [0383] As described above, according to the related term acquisition device and the like according to the present invention, the related term information related to the accepted term information can be acquired, and the device etc. that acquires the related term related to a certain term Useful as.
[0384] また、本発明による技術動向情報作成装置等によれば、技術用語とその技術用語 を含む技術文献に関する時期とを対応付ける情報である技術動向情報を作成するこ とができ、文献情報等から技術動向情報を作成する装置等において有用である。 図面の簡単な説明 [0384] Further, according to the technical trend information creation device or the like according to the present invention, it is possible to create technical trend information that is information for associating technical terms with the timing related to the technical literature including the technical terms. This is useful in devices that create technical trend information from Brief Description of Drawings
[0385] [図 1]本発明の実施の形態 1による関連用語取得装置の構成を示すブロック図 FIG. 1 is a block diagram showing a configuration of a related term acquisition device according to Embodiment 1 of the present invention.
[図 2]同実施の形態による関連用語情報取得部の構成を示すブロック図  FIG. 2 is a block diagram showing a configuration of a related term information acquisition unit according to the embodiment
[図 3]同実施の形態による関連用語取得装置の動作を示すフローチャート  FIG. 3 is a flowchart showing the operation of the related term acquisition apparatus according to the embodiment.
[図 4]同実施の形態による関連用語取得装置の動作を示すフローチャート  FIG. 4 is a flowchart showing the operation of the related term acquisition apparatus according to the embodiment.
[図 5]同実施の形態による関連用語取得装置の動作を示すフローチャート  FIG. 5 is a flowchart showing the operation of the related term acquisition apparatus according to the embodiment.
[図 6]同実施の形態による関連用語取得装置の動作を示すフローチャート  FIG. 6 is a flowchart showing the operation of the related term acquisition apparatus according to the embodiment.
[図 7]同実施の形態における実験例について説明するための図  FIG. 7 is a diagram for explaining an experimental example in the embodiment.
[図 8]同実施の形態における実験例について説明するための図  FIG. 8 is a diagram for explaining an experimental example in the embodiment.
[図 9]本発明の実施の形態 2による関連用語取得装置の構成を示すブロック図  FIG. 9 is a block diagram showing a configuration of a related term acquisition apparatus according to Embodiment 2 of the present invention.
[図 10]同実施の形態による関連用語情報取得部の構成を示すブロック図  FIG. 10 is a block diagram showing the configuration of a related term information acquisition unit according to the embodiment
[図 11]同実施の形態による関連用語情報取得部の構成を示すブロック図  FIG. 11 is a block diagram showing a configuration of a related term information acquisition unit according to the embodiment
[図 12]本発明の実施の形態 3による関連用語情報取得部の動作を示すフローチヤ一 卜  FIG. 12 is a flowchart showing the operation of the related term information acquisition unit according to the third embodiment of the present invention.
[図 13]同実施の形態におけるタイプ C用手が力り句辞書の一例を示す図  FIG. 13 is a diagram showing an example of a type C hand-warming phrase dictionary according to the embodiment.
[図 14]同実施の形態におけるタイプ B用手が力り句辞書の一例を示す図 圆 15]同実施の形態における文献情報の書誌情報の一例を示す図 FIG. 14 is a diagram showing an example of a type B hand-powered phrase dictionary according to the embodiment. [15] A diagram showing an example of bibliographic information of bibliographic information in the embodiment
圆 16]本発明の実施の形態 4による技術動向情報作成装置の構成を示すブロック図 圆 17]同実施の形態による技術動向情報作成装置の動作を示すフローチャート 圆 18]同実施の形態による技術動向情報作成装置の動作を示すフローチャート 圆 19]同実施の形態による技術動向情報作成装置の動作を示すフローチャート 圆 20]同実施の形態による技術動向情報作成装置の動作を示すフローチャート 圆 21]同実施の形態による技術動向情報作成装置の動作を示すフローチャート圆 16] Block diagram showing the configuration of the technology trend information creation device according to Embodiment 4 of the present invention. 圆 17] Flow chart showing the operation of the technology trend information creation device according to the embodiment. 圆 18] Technology trend according to the embodiment. Flowchart showing the operation of the information creation device 圆 19] Flowchart showing the operation of the technology trend information creation device according to the embodiment 圆 20] Flowchart showing the operation of the technology trend information creation device according to the embodiment 圆 21] The flowchart which shows operation | movement of the technical trend information preparation apparatus by form
[図 22]同実施の形態における技術用語抽出部の構成を示すブロック図 FIG. 22 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
圆 23]同実施の形態における技術用語抽出部の動作を示すフローチャート [23] A flowchart showing the operation of the technical term extraction unit in the embodiment
[図 24]同実施の形態における技術用語抽出部の構成を示すブロック図  FIG. 24 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
[図 25]同実施の形態における技術用語抽出部の動作を示すフローチャート  FIG. 25 is a flowchart showing the operation of the technical term extraction unit in the embodiment.
[図 26]同実施の形態における技術用語抽出部の構成を示すブロック図  FIG. 26 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
圆 27]同実施の形態における技術用語抽出部の動作を示すフローチャート [27] A flowchart showing the operation of the technical term extraction unit in the embodiment
[図 28]同実施の形態における技術用語抽出部の構成を示すブロック図  FIG. 28 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
[図 29]同実施の形態における技術用語抽出部の動作を示すフローチャート  FIG. 29 is a flowchart showing the operation of the technical term extraction unit in the embodiment.
[図 30]同実施の形態における技術用語抽出部の構成を示すブロック図  FIG. 30 is a block diagram showing a configuration of a technical term extraction unit in the embodiment.
圆 31]同実施の形態における技術用語抽出部の動作を示すフローチャート 圆 32]同実施の形態における技術用語抽出部の動作を示すフローチャート 圆 33]同実施の形態における構造タグと手が力り語句との対応の一例を示す図圆 31] Flow chart showing the operation of the technical term extraction unit in the embodiment 圆 32] Flow chart showing the operation of the technical term extraction unit in the embodiment 圆 33] Structure tag and hand in the same embodiment Showing an example of correspondence with
[図 34]同実施の形態における文献情報記憶部で記憶されて 、る情報の一例を示す 図 FIG. 34 is a diagram showing an example of information stored in the document information storage unit in the embodiment.
[図 35]同実施の形態におけるタイプ Cの手力かり語句と手力かり語句の位置との対応 の一例を示す図  FIG. 35 is a diagram showing an example of correspondence between type C clue words and positions of clue words in the embodiment.
圆 36]同実施の形態における文献情報取得部が取得した文献情報の一例を示す図 [図 37]同実施の形態における手力かり語句記憶手段で記憶されている手が力り語句 の一例を示す図 圆 36] A diagram showing an example of document information acquired by the document information acquisition unit in the embodiment. [FIG. 37] An example of a hand-powered phrase stored in the clue phrase storage unit in the embodiment. Illustration
圆 38]同実施の形態における抽出された技術用語の一例を示す図 圆 38] A diagram showing an example of extracted technical terms in the embodiment
圆 39]同実施の形態における取得された時期情報の一例を示す図 圆 40]同実施の形態における出力された技術動向情報の一例を示す図 [39] A diagram showing an example of acquired time information in the embodiment 圆 40] A diagram showing an example of the technology trend information output in the embodiment
圆 41]同実施の形態における技術動向情報の一例を示す図 圆 41] Diagram showing an example of technology trend information in the embodiment
[図 42]同実施の形態における専門用語記憶手段で記憶されている専門用語の一例 を示す図  FIG. 42 is a diagram showing an example of technical terms stored in the technical term storage means in the embodiment.
圆 43]本発明の実施の形態 5による技術動向情報作成装置の構成を示すブロック図 圆 44]同実施の形態による技術動向情報作成装置の動作を示すフローチャート 圆 45]同実施の形態による技術動向情報作成装置の動作を示すフローチャート 圆 46]本発明の実施の形態 6による技術動向情報作成装置の構成を示すブロック図 圆 47]同実施の形態による技術動向情報作成装置の動作を示すフローチャート 圆 48]同実施の形態における出力された技術動向情報の一例を示す図 圆 43] Block diagram showing the configuration of the technology trend information creation device according to Embodiment 5 of the present invention 圆 44] Flow chart showing the operation of the technology trend information creation device according to the embodiment 圆 45] Technology trend according to the embodiment Flowchart showing the operation of the information creation device 圆 46] Block diagram showing the configuration of the technical trend information creation device according to Embodiment 6 of the present invention 圆 47] Flowchart showing the operation of the technical trend information creation device according to the embodiment 圆 48 ] A diagram showing an example of the technology trend information output in the embodiment
[図 49]実験結果を示す図 [Fig.49] Diagram showing experimental results
[図 50]実験結果を示す図 [Fig.50] Diagram showing experimental results
[図 51]実験結果を示す図 [Fig.51] Diagram showing experimental results
圆 52]他の形態による技術動向情報作成装置の構成を示すブロック図 圆 52] Block diagram showing the configuration of a technical trend information creation device according to another embodiment
圆 53]他の形態における分類結果情報の一例を示す図 [53] A diagram showing an example of classification result information in another form
圆 54]他の形態における出力された技術動向情報の一例を示す図 圆 54] A figure showing an example of the output technical trend information in another form
[図 55]コンピュータシステムの外観一例を示す模式図  FIG. 55 is a schematic diagram showing an example of the external appearance of a computer system.
[図 56]コンピュータシステムの構成の一例を示す図  FIG. 56 is a diagram showing an example of the configuration of a computer system

Claims

請求の範囲 The scope of the claims
[1] 文献を示す情報である文献情報が 2以上記憶される文献情報記憶部と、  [1] A document information storage unit that stores two or more document information that is information indicating a document;
用語を示す情報である用語情報を受け付ける用語情報受付部と、  A term information accepting unit that accepts term information that is information indicating a term;
前記用語情報受付部が受け付けた用語情報の示す用語を文献情報の全部または 一部に有する文献情報を前記文献情報記憶部で記憶されている 2以上の文献情報 から検索し、検索された文献情報の全部、または当該検索された文献情報の一部で あって、当該検索された文献情報において引用している他の文献を識別する情報で ある引用文献識別情報を少なくとも有する部分を前記文献情報記憶部から取得する 文献情報取得部と、  The document information searched for the document information having the term indicated by the term information received by the term information receiving unit in all or part of the document information from the two or more document information stored in the document information storage unit. Or a part having at least cited document identification information which is a part of the retrieved document information and is information identifying other documents cited in the retrieved document information. A literature information acquisition unit acquired from the department;
前記文献情報取得部が取得した文献情報の全部または一部に含まれる弓 I用文献識 別情報で識別される文献を示す情報である弓 I用文献情報を前記文献情報記憶部で 記憶されている 2以上の文献情報から検索し、検索された引用文献情報の一部であ つて、あら力じめ決められている項目に対応する部分を前記文献情報記憶部から取 得する弓 I用文献情報取得部と、  The document information storage unit stores the document information for bow I, which is information indicating the document identified by the document identification information for bow I included in all or part of the document information acquired by the document information acquisition unit. Search from two or more document information, and obtain a portion corresponding to the item that has been determined in advance from the retrieved document information storage unit, which is a part of the retrieved cited document information. An acquisition unit;
前記引用文献情報取得部が取得した引用文献情報の一部から、前記用語情報受付 部が受け付けた用語情報の示す用語に関連する関連用語を示す情報である関連用 語情報を取得する関連用語情報取得部と、  Related term information for acquiring related term information, which is information indicating a related term related to the term indicated by the term information received by the term information receiving unit, from a part of the cited reference information acquired by the cited reference information acquiring unit. An acquisition unit;
前記関連用語情報取得部が取得した関連用語情報を出力する関連用語情報出力 部と、を具備する関連用語取得装置。  A related term acquisition device comprising: a related term information output unit that outputs related term information acquired by the related term information acquisition unit.
[2] 前記文献情報は、当該文献情報の構造を記述する情報である構造記述情報を有し 前記引用文献情報取得部は、前記文献情報取得部が取得した文献情報の全部また は一部に含まれる I用文献識別情報で識別される I用文献を示す引用文献情報の 一部であって、前記構造記述情報によって、あら力じめ決められている項目に対応 付けられている部分を取得する、請求項 1記載の関連用語取得装置。 [2] The document information includes structure description information that is information describing a structure of the document information. The cited document information acquisition unit includes all or part of the document information acquired by the document information acquisition unit. A part of cited document information indicating an I document identified by the included I document identification information, which is associated with an item that has been decided according to the structure description information, is obtained. The related term acquisition device according to claim 1.
[3] 前記引用文献情報の 1以上の部分と、当該部分の項目との対応を示す情報である引 用文献構造情報を取得する弓 I用文献構造情報取得部をさらに備え、 [3] A reference structure information acquisition unit for bow I that acquires reference document structure information that is information indicating correspondence between one or more parts of the cited document information and items of the part,
前記引用文献情報取得部は、前記引用文献構造情報取得部が取得した引用文献 構造情報を用いて、前記引用文献情報の一部であって、あら力じめ決められている 項目に対応する部分を取得する、請求項 1記載の関連用語取得装置。 The cited document information acquisition unit is a cited document acquired by the cited document structure information acquisition unit. 2. The related term acquisition apparatus according to claim 1, wherein the related information acquisition unit acquires a part of the cited document information corresponding to an item that has been decided by using structural information.
[4] 前記引用文献情報は、特許文書を示す情報であり、  [4] The cited document information is information indicating a patent document,
前記あら力じめ決められている項目に対応する引用文献情報の部分は、独立請求 項の部分である、請求項 1から請求項 3のいずれか記載の関連用語取得装置。  4. The related term acquisition apparatus according to claim 1, wherein a part of cited reference information corresponding to an item that is determined in advance is a part of an independent claim.
[5] 前記引用文献情報は、特許文書を示す情報であり、  [5] The cited document information is information indicating a patent document,
前記あらかじめ決められている項目に対応する引用文献情報の部分は、発明の効果 を記載している部分である、請求項 1から請求項 3のいずれか記載の関連用語取得 装置。  The related term acquisition apparatus according to any one of claims 1 to 3, wherein the portion of the cited document information corresponding to the predetermined item is a portion describing an effect of the invention.
[6] 前記引用文献情報は、特許文書を示す情報であり、  [6] The cited document information is information indicating a patent document,
前記あら力じめ決められている項目に対応する引用文献情報の部分は、請求項の前 提部以外の部分である、請求項 1から請求項 3のいずれか記載の関連用語取得装置  The related term acquisition device according to any one of claims 1 to 3, wherein the part of the cited document information corresponding to the preliminarily determined item is a part other than the preamble of the claim.
[7] 前記文献情報取得部は、文献情報の一部を取得する場合に、前記引用文献識別情 報を少なくとも有する部分であって、前記引用文献識別情報で識別される文献につ Vヽての記述を示す情報である弓 I用情報を少なくとも有する部分を取得し、 前記引用文献情報取得部は、前記文献情報取得部が取得した文献情報の全部また は一部に含まれる弓 I用文献識別情報で識別される文献を示す情報である引用文献 情報の一部であって、前記文献情報取得部が取得した文献情報の全部または一部 に含まれる弓 I用情報における弓 I用形式に応じてあら力じめ決められて 、る項目に対 応する部分を取得する、請求項 1から請求項 6のいずれか記載の関連用語取得装置 [7] When the document information acquisition unit acquires a part of the document information, the document information acquisition unit is a part having at least the cited document identification information and identified by the cited document identification information. A portion having at least the information for bow I, which is information indicating the description of the document, and the cited document information acquisition unit includes the document for bow I included in all or part of the document information acquired by the document information acquisition unit. It is a part of cited document information that is information indicating a document identified by the identification information, and is in the bow I format in the bow I information included in all or a part of the document information acquired by the document information acquisition unit. The related term acquisition device according to any one of claims 1 to 6, wherein the related term acquisition device acquires a portion corresponding to an item determined in advance according to the determination.
[8] 前記引用形式は、引用文献の問題を指摘する形式である問題指摘型引用形式と、 引用文献を論説の根拠に取り込む形式である論説根拠型引用形式とを少なくとも含 む、請求項 7記載の関連用語取得装置。 [8] The citation form includes at least a problem-pointing citation form that points out the problem of the cited document and an article-based citation form that incorporates the cited document into the basis of the article. The related term acquisition device described.
[9] 前記引用文献情報取得部は、前記文献情報取得部が取得した文献情報の全部また は一部に含まれる引用情報における引用形式が問題指摘型引用形式である場合に は、背景的な項目を有する部分を取得し、前記文献情報取得部が取得した文献情 報の全部または一部に含まれる引用情報における引用形式が論説根拠型引用形式 である場合には、特徴的な項目を有する部分を取得する、請求項 8記載の関連用語 取得装置。 [9] The cited document information acquisition unit is a background when the citation format in the citation information included in all or part of the document information acquired by the document information acquisition unit is a problem-pointing citation format. The document information acquired by the document information acquisition unit is acquired. 9. The related term acquisition device according to claim 8, wherein when the citation format in the citation information included in all or part of the report is an editorial basis citation format, a portion having a characteristic item is acquired.
[10] 前記関連用語情報取得部は、  [10] The related term information acquisition unit includes:
前記引用文献情報取得部が取得した引用文献情報の一部における関連用語の候 補となる情報である関連用語候補情報を取得する関連用語候補情報取得手段と、 前記関連用語候補情報取得手段が取得した関連用語候補情報の示す用語の関連 度を前記関連用語候補情報ごとに算出する関連度算出手段と、  Related term candidate information acquisition means for acquiring related term candidate information, which is information used as candidates for related terms in a part of the cited reference information acquired by the cited reference information acquisition unit, and acquired by the related term candidate information acquisition means Relevance calculation means for calculating the relevance level of the terms indicated by the related term candidate information for each related term candidate information;
前記関連用語候補情報取得手段が取得した関連用語候補情報のうち、対応する関 連度が上位の関連用語候補情報を関連用語情報に決定する関連用語情報決定手 段と、を具備する、請求項 1から請求項 9のいずれか記載の関連用語取得装置。  A related term information determining unit that determines related term candidate information having a higher corresponding degree of related term candidate information as related term information from among the related term candidate information acquired by the related term candidate information acquisition means. The related term acquisition device according to any one of claims 1 to 9.
[11] 前記文献情報記憶部では、第 1の種類の文献情報と、第 2の種類の文献情報とが少 なくとも記憶され、  [11] The document information storage unit stores at least a first type of document information and a second type of document information,
前記文献情報取得部は、前記文献情報記憶部で記憶されている前記第 1の種類の 文献情報から、第 1の種類の文献情報の全部または一部を取得し、  The document information acquisition unit acquires all or part of the first type of document information from the first type of document information stored in the document information storage unit,
前記引用文献情報取得部は、前記文献情報記憶部で記憶されている前記第 2の種 類の文献情報から、引用文献情報の一部を取得する、請求項 1から請求項 10のい ずれか記載の関連用語取得装置。  The cited document information acquisition unit acquires a part of the cited document information from the second type of document information stored in the document information storage unit. The related term acquisition device described.
[12] 文献を示す情報である文献情報が 2以上記憶される文献情報記憶部と、用語情報受 付部と、文献情報取得部と、引用文献情報取得部と、関連用語情報取得部と、関連 用語情報出力部とを備えた関連用語取得装置で用いられる関連用語取得方法であ つて、 [12] A document information storage unit that stores two or more document information, which is information indicating a document, a term information reception unit, a document information acquisition unit, a cited document information acquisition unit, a related term information acquisition unit, A related term acquisition method used in a related term acquisition device including a related term information output unit,
用語情報受付部が、用語を示す情報である用語情報を受け付ける用語情報受付ス テツプと、  A term information receiving step for receiving term information, which is information indicating a term, and a term information receiving unit;
文献情報取得部が、前記用語情報受付ステップで受け付けた用語情報の示す用語 を文献情報の全部または一部に有する文献情報を、前記文献情報記憶部で記憶さ れている 2以上の文献情報から検索し、検索された文献情報の全部、または当該検 索された文献情報の一部であって、当該検索された文献情報において引用している 他の文献を識別する情報である引用文献識別情報を少なくとも有する部分を前記文 献情報記憶部から取得する文献情報取得ステップと、 From the two or more document information stored in the document information storage unit, the document information acquisition unit stores the document information having the term indicated by the term information received in the term information reception step in all or part of the document information. Searched, all of the retrieved document information, or a part of the retrieved document information, cited in the retrieved document information A document information acquisition step of acquiring a part having at least cited document identification information that is information for identifying other documents from the document information storage unit;
引用文献情報取得部が、前記文献情報取得ステップで取得した文献情報の全部ま たは一部に含まれる弓 I用文献識別情報で識別される文献を示す情報である引用文 献情報を前記文献情報記憶部で記憶されて!、る 2以上の文献情報から検索し、検索 された引用文献情報の一部であって、あら力じめ決められている項目に対応する部 分を前記文献情報記憶部から取得する引用文献情報取得ステップと、 The cited document information acquisition unit obtains the cited document information as information indicating the document identified by the document identification information for the bow I included in all or part of the document information acquired in the document information acquisition step. Stored in the information storage unit !, search from two or more document information, and select a part of the retrieved cited document information corresponding to a predetermined item from the document information. Cited reference information acquisition step acquired from the storage unit;
関連用語情報取得部が、前記引用文献情報取得ステップで取得した引用文献情報 の一部から、前記用語情報受付ステップが受け付けた用語情報の示す用語に関連 する関連用語を示す情報である関連用語情報を取得する関連用語情報取得ステツ プと、 Related term information, which is related term information that the related term information acquisition unit indicates related terms related to the term indicated by the term information received by the term information receiving step, from a part of the cited reference information acquired in the cited reference information acquiring step. Related term information acquisition step to acquire
関連用語情報出力部が、前記関連用語情報取得ステップで取得した関連用語情報 を出力する関連用語情報出力ステップと、を具備する関連用語取得方法。 A related term information output method comprising: a related term information output unit that outputs related term information acquired in the related term information acquisition step.
コンピュータに、 On the computer,
用語を示す情報である用語情報を受け付ける用語情報受付ステップと、 A term information receiving step for receiving term information, which is information indicating a term,
前記用語情報受付ステップで受け付けた用語情報の示す用語を文献情報の全部ま たは一部に有する文献情報を、文献を示す情報である文献情報が 2以上記憶される 文献情報記憶部で記憶されて!、る 2以上の文献情報から検索し、検索された文献情 報の全部、または当該検索された文献情報の一部であって、当該検索された文献情 報において引用して 、る他の文献を識別する情報である弓 I用文献識別情報を少なく とも有する部分を前記文献情報記憶部から取得する文献情報取得ステップと、 前記文献情報取得ステップで取得した文献情報の全部または一部に含まれる弓 I用 文献識別情報で識別される文献を示す情報である弓 I用文献情報を前記文献情報記 憶部で記憶されている 2以上の文献情報力 検索し、検索された引用文献情報の一 部であって、あら力じめ決められている項目に対応する部分を前記文献情報記憶部 から取得する引用文献情報取得ステップと、 The document information having the term indicated in the term information received in the term information receiving step in all or a part of the document information is stored in the document information storage unit in which two or more document information that is information indicating the document is stored. Searching from two or more document information, all of the retrieved document information, or a part of the retrieved document information, cited in the retrieved document information, etc. A document information acquisition step of acquiring from the document information storage unit a portion having at least the document identification information for bow I, which is information for identifying the document, and all or part of the document information acquired in the document information acquisition step. The document information for bow I, which is information indicating the document identified by the document identification information for bow I included, is searched for two or more documents information stored in the document information storage unit, and the retrieved cited document information Part of There are a cited document information acquisition step of acquiring a portion corresponding to the item are determined because Ji roughness force from the document information storage unit,
前記引用文献情報取得ステップで取得した引用文献情報の一部から、前記用語情 報受付ステップが受け付けた用語情報の示す用語に関連する関連用語を示す情報 である関連用語情報を取得する関連用語情報取得ステップと、 Information indicating related terms related to the term indicated by the term information received by the term information receiving step from a part of the cited reference information acquired in the cited reference information acquiring step. A related term information acquisition step for acquiring related term information,
前記関連用語情報取得ステップで取得した関連用語情報を出力する関連用語情報 出力ステップと、を実行させるためのプログラム。  And a related term information output step for outputting the related term information acquired in the related term information acquisition step.
[14] 技術文献の少なくとも一部を示す情報である複数の文献情報と、当該文献情報の示 す技術文献に関する時期を示す情報である時期情報とが少なくとも記憶される文献 情報記憶部と、  [14] A document information storage unit that stores at least a plurality of document information that is information indicating at least part of the technical document, and time information that is information indicating a time related to the technical document indicated by the document information,
前記文献情報記憶部で記憶されて!、る複数の文献情報のうち、少なくとも一部の文 献情報の全部または一部を取得する文献情報取得部と、  A document information acquisition unit that acquires all or part of at least some of the document information among the plurality of document information stored in the document information storage unit;
前記文献情報取得部が取得した文献情報の全部または一部力 技術用語を抽出す る技術用語抽出部と、  A technical term extraction unit that extracts all or part of the technical information obtained by the literature information acquisition unit;
前記技術用語抽出部が抽出した技術用語が含まれる文献情報に対応する時期情報 を前記文献情報記憶部から取得する時期情報取得部と、  A time information acquisition unit that acquires time information corresponding to document information including the technical terms extracted by the technical term extraction unit from the document information storage unit;
前記技術用語抽出部が抽出した技術用語と、前記時期情報取得部が取得した時期 情報とを用いて、技術用語と当該技術用語を含む技術文献に関する時期とを対応付 ける情報である技術動向情報を作成する技術動向情報作成部と、  Technical trend information, which is information associating technical terms with the timing of technical documents including the technical terms, using the technical terms extracted by the technical term extraction unit and the timing information acquired by the timing information acquisition unit Technical trend information creation department to create
前記技術動向情報作成部が作成した技術動向情報を出力する出力部と、を備えた 技術動向情報作成装置。  A technology trend information creation device comprising: an output unit that outputs the technology trend information created by the technology trend information creation unit.
[15] 検索用語を受け付ける検索用語受付部と、 [15] A search term reception unit for receiving search terms;
前記検索用語受付部が受け付けた検索用語を用いて、前記文献情報記憶部を検索 する検索部と、をさらに備え、  A search unit that searches the document information storage unit using the search term received by the search term reception unit,
前記文献情報取得部は、検索された文献情報の全部または一部を取得する、請求 項 14記載の技術動向情報作成装置。  15. The technical trend information creation device according to claim 14, wherein the literature information acquisition unit acquires all or a part of searched literature information.
[16] 2以上の検索用語が記憶される検索用語記憶部と、  [16] a search term storage unit in which two or more search terms are stored;
前記検索用語記憶部で記憶されている 2以上の検索用語のうち、いずれかの検索用 語の選択を受け付ける検索用語選択受付部と、をさらに備え、  A search term selection accepting unit that accepts selection of one of the search terms among the two or more search terms stored in the search term storage unit,
前記検索用語受付部は、前記検索用語選択受付部が受け付けた選択で示される検 索用語を受け付ける、請求項 15記載の技術動向情報作成装置。  16. The technical trend information creation device according to claim 15, wherein the search term reception unit receives a search term indicated by the selection received by the search term selection reception unit.
[17] 前記文献情報記憶部で記憶されている文献情報から専門用語を抽出する専門用語 抽出部と、 [17] Technical terms for extracting technical terms from the literature information stored in the literature information storage unit An extractor;
所定の接尾語が記憶される接尾語記憶部と、  A suffix storage unit for storing a predetermined suffix;
前記専門用語抽出部が抽出した専門用語に前記接尾語記憶部で記憶されている接 尾語を付けた用語が、前記文献情報記憶部で記憶されて ヽる文献情報に含まれる 場合に、当該専門用語を検索用語として前記検索用語記憶部に蓄積する検索用語 蓄積部と、をさらに備えた、請求項 16記載の技術動向情報作成装置。  When the terminology obtained by adding the suffix stored in the suffix storage unit to the technical term extracted by the technical term extraction unit is included in the document information stored in the document information storage unit, 17. The technical trend information creation device according to claim 16, further comprising: a search term storage unit that stores technical terms as search terms in the search term storage unit.
[18] 前記検索部は、前記検索用語受付部が受け付けた検索用語がメイントピックである 技術文献に対応する文献情報を検索する、請求項 15から請求項 17のいずれか記 載の技術動向情報作成装置。 [18] The technical trend information according to any one of claims 15 to 17, wherein the search unit searches for document information corresponding to a technical document whose search term received by the search term receiving unit is a main topic. Creation device.
[19] 前記文献情報は、技術文献のタイトルを少なくとも含む情報であり、 [19] The literature information is information including at least a title of technical literature,
前記検索部は、前記文献情報記憶部で記憶されて!、る文献情報の示す技術文献の タイトルにおけるメイントピックを示す領域に、前記検索用語受付部が受け付けた検 索用語が含まれる文献情報を検索する、請求項 18記載の技術動向情報作成装置。  The search unit is stored in the document information storage unit, and the document information including the search term received by the search term receiving unit is included in the area indicating the main topic in the title of the technical document indicated by the document information. 19. The technical trend information creation device according to claim 18, wherein the technical trend information creation device searches.
[20] 前記メイントピックを示す領域は、タイトルの最後の領域である、請求項 19記載の技 術動向情報作成装置。 20. The technical trend information creating apparatus according to claim 19, wherein the area indicating the main topic is a last area of the title.
[21] 前記検索部によって検索された文献情報と分野が同一の文献情報を特定する特定 部をさらに備え、  [21] The apparatus further comprises a specifying unit that specifies the document information having the same field as the document information searched by the search unit,
前記文献情報取得部は、前記特定部が特定した文献情報の全部または一部も前記 文献情報記憶部から取得する、請求項 15から請求項 20のいずれか記載の技術動 向情報作成装置。  21. The technical direction information creation device according to claim 15, wherein the document information acquisition unit acquires all or part of the document information specified by the specification unit from the document information storage unit.
[22] 前記特定部は、前記検索部によって検索された文献情報から、当該文献情報にお いて引用して 、る他の技術文献を識別する情報である弓 I用文献識別情報を取得し、 当該引用文献識別情報を用いて、前記検索部によって検索された文献情報と分野 が同一の文献情報を特定する、請求項 21記載の技術動向情報作成装置。  [22] The specifying unit acquires, from the document information searched by the search unit, the document identification information for bow I that is information for identifying other technical documents cited in the document information, The technical trend information creation device according to claim 21, wherein the cited document identification information is used to identify document information having the same field as the document information searched by the search unit.
[23] 前記特定部は、前記検索部によって検索された文献情報から、当該文献情報にお いて引用して 、る他の技術文献にっ 、ての記述を示す情報である弓 I用情報も取得 し、当該引用情報の示す引用形式が、引用している技術文献の問題を指摘する形 式である問題指摘型引用形式である場合に、当該引用情報で記述されている技術 文献を識別する I用文献識別情報を用 ヽて、前記検索部によって検索された文献 情報と分野が同一の文献情報を特定する、請求項 22記載の技術動向情報作成装 置。 [23] The specifying unit also includes information for bow I which is information indicating a description of other technical documents cited in the document information from the document information searched by the search unit. If the citation format obtained and indicated by the citation information is a problem-pointing citation format that points out the problem of the cited technical document, the technology described in the citation information 23. The technical trend information creation device according to claim 22, wherein the literature identification information for I that identifies a document is used to identify literature information having the same field as the literature information searched by the search unit.
[24] 前記技術用語抽出部は、  [24] The technical term extraction unit includes:
1以上の手力かり語句が記憶される手力かり語句記憶手段と、  A clue word storage means for storing one or more clue words,
前記文献情報取得部が取得した文献情報の全部または一部にぉ 、て、前記手がか り語句を検索する手力 Sかり語句検索手段と、  A manual phrase search means for searching for a clue phrase by using all or a part of the reference information acquired by the reference information acquisition section;
前記手が力り語句検索手段が検索した手が力り語句と所定の関係を有する用語を、 技術用語として抽出する技術用語抽出手段と、を備えた、請求項 14から請求項 23 のいずれか記載の技術動向情報作成装置。  The technical term extracting means for extracting, as a technical term, a term having a predetermined relationship with the hand-squeezed phrase searched by the hand-squeezed phrase searching means. The described technical trend information creation device.
[25] 前記技術用語抽出部は、 [25] The technical term extraction unit includes:
1以上の手力かり語句が記憶される手力かり語句記憶手段と、  A clue word storage means for storing one or more clue words,
前記文献情報取得部が取得した文献情報の全部または一部にぉ 、て、前記手がか り語句を検索する手力 Sかり語句検索手段と、  A manual phrase search means for searching for a clue phrase by using all or a part of the reference information acquired by the reference information acquisition section;
前記検索用語受付部が受け付けた検索用語と、前記手がかり語句検索手段が検索 した手力かり語句によって対応付けられている用語を技術用語として抽出する技術 用語抽出手段と、を備えた、請求項 15から請求項 23のいずれか記載の技術動向情 報作成装置。  The search term received by the search term receiving unit, and a technical term extraction unit that extracts a term associated with the clue phrase searched by the clue phrase search unit as a technical term. 24. The technical trend information creating device according to claim 23.
[26] 前記技術用語抽出部は、 [26] The technical term extraction unit includes:
1以上の手力かり語句が記憶される手力かり語句記憶手段と、  A clue word storage means for storing one or more clue words,
前記文献情報取得部が取得した文献情報の全部または一部にぉ 、て、前記手がか り語句を検索する手力 Sかり語句検索手段と、  A manual phrase search means for searching for a clue phrase by using all or a part of the reference information acquired by the reference information acquisition section;
2以上の専門用語が記憶される専門用語記憶手段と、  Terminology storage means for storing two or more terminology,
前記手が力り語句検索手段が検索した手が力り語句と所定の関係を有する用語に、 前記専門用語記憶手段で記憶されている専門用語が含まれる場合に、当該専門用 語を技術用語として抽出する技術用語抽出手段と、を備えた、請求項 14から請求項 23のいずれか記載の技術動向情報作成装置。  If the term retrieved by the hand-powered phrase search means includes a technical term stored in the technical term storage means in the term having a predetermined relationship with the hand-stretched phrase, the technical term is referred to as a technical term. 24. The technical trend information creating device according to claim 14, further comprising: a technical term extracting unit that extracts as a technical term.
[27] 前記技術用語抽出部は、 1以上の手力かり語句が記憶される手力かり語句記憶手段と、 [27] The technical term extraction unit includes: A clue word storage means for storing one or more clue words,
前記文献情報取得部が取得した文献情報の全部または一部にぉ 、て、前記手がか り語句を検索する手力 Sかり語句検索手段と、  A manual phrase search means for searching for a clue phrase by using all or a part of the reference information acquired by the reference information acquisition section;
前記手が力り語句検索手段が検索した手が力り語句と所定の関係を有する用語を、 技術用語候補として抽出する技術用語候補抽出手段と、  Technical term candidate extraction means for extracting, as a technical term candidate, a term having a predetermined relationship with the hand searched for by the hand force phrase search means;
前記技術用語候補抽出手段が抽出した技術用語候補の全部または一部を技術用 語として選択する技術用語選択手段と、を備えた、請求項 14から請求項 23のいずれ か記載の技術動向情報作成装置。  The technical trend information creation according to any one of claims 14 to 23, further comprising technical term selection means for selecting all or part of technical term candidates extracted by the technical term candidate extraction means as technical terms. apparatus.
[28] 前記技術用語抽出部は、 [28] The technical term extraction unit includes:
前記技術用語が取り得る 1以上の接尾語が記憶される技術用語接尾語記憶手段を さらに備え、  Technical term suffix storage means for storing one or more suffixes that can be taken by the technical term is further provided,
前記技術用語選択手段は、前記技術用語接尾語記憶手段で記憶されて!、る接尾語 を有する技術用語候補を技術用語として選択する、請求項 27記載の技術動向情報 作成装置。  28. The technical trend information creating apparatus according to claim 27, wherein the technical term selection means selects a technical term candidate having a suffix as stored in the technical term suffix storage means as a technical term.
[29] 前記技術用語選択手段は、前記文献情報記憶部で記憶されて!、る!、ずれかの文献 情報にお 、てメイントピックとして用いられて 、る技術用語候補を技術用語として選 択する、請求項 27記載の技術動向情報作成装置。  [29] The technical term selection means selects a technical term candidate as a technical term that is stored in the literature information storage unit and used as a main topic in any literature information! The technical trend information creation device according to claim 27.
[30] 前記技術用語抽出部は、 [30] The technical term extraction unit includes:
前記技術用語候補が抽出された文献情報と、分野が同一の文献情報を特定する文 献情報特定手段と、  Document information identifying means for identifying document information from which the technical term candidates are extracted, and document information having the same field,
前記技術用語が取り得る 1以上の接尾語が記憶される技術用語接尾語記憶手段と、 をさらに備え、  Technical term suffix storage means for storing one or more suffixes that can be taken by the technical term, and further comprising:
前記文献情報取得部は、前記文献情報特定手段が特定した文献情報の全部または 一部も取得し、  The document information acquisition unit also acquires all or part of the document information specified by the document information specifying means,
前記技術用語選択手段は、前記文献情報取得部が取得した、一の文献情報の全部 または一部と、当該一の文献情報と分野が同一であると前記文献情報特定手段によ つて特定された文献情報の全部または一部とから抽出された技術用語候補の少なく ともいずれかが、前記接尾語記憶手段で記憶されている接尾語を有する場合に、前 記一の文献情報の全部または一部力 抽出された技術用語候補を技術用語として 選択する、請求項 27記載の技術動向情報作成装置。 The technical term selection means is specified by the reference information specifying means that all or a part of one reference information acquired by the reference information acquisition unit and the field are the same as the one reference information. If at least one of the technical term candidates extracted from all or part of the document information has the suffix stored in the suffix storage means, 28. The technical trend information creating apparatus according to claim 27, wherein the technical term candidates extracted from all or a part of the document information as described above are selected as technical terms.
[31] 前記技術動向情報は、技術用語ごとに、当該技術用語を含む文献情報に対応する 時期情報の示す時期を示す画像であり、 [31] The technical trend information is an image indicating the time indicated by the time information corresponding to the document information including the technical term for each technical term.
前記出力部は、前記技術動向情報を表示出力する、請求項 14から請求項 30のい ずれか記載の技術動向情報作成装置。  31. The technical trend information creation device according to claim 14, wherein the output unit displays and outputs the technical trend information.
[32] 前記技術動向情報は、技術用語ごとに、当該技術用語を含む文献情報に対応する 図形である文献図形が、当該文献情報に対応する時期情報の示す時期に応じて表 示される画像であり、 [32] The technical trend information is an image in which, for each technical term, a document graphic that is a graphic corresponding to the document information including the technical term is displayed according to the time indicated by the time information corresponding to the document information. Yes,
前記技術動向情報における文献図形の選択を受け付ける図形選択受付部と、 前記図形選択受付部が選択を受け付けた文献図形に対応する文献情報に関する 情報であり、前記技術動向情報の画像に表示する情報である表示情報を前記文献 情報記憶部から取得する表示情報取得部と、をさらに備え、  A graphic selection receiving unit that receives selection of a document graphic in the technical trend information; A display information acquisition unit that acquires certain display information from the document information storage unit,
前記技術動向情報作成部は、前記表示情報取得部が取得した表示情報を表示する 技術動向情報を作成する、請求項 31記載の技術動向情報作成装置。  32. The technical trend information creation device according to claim 31, wherein the technical trend information creation unit creates technical trend information for displaying the display information acquired by the display information acquisition unit.
[33] 技術文献の少なくとも一部を示す情報である複数の文献情報と、当該文献情報の示 す技術文献に関する時期を示す情報である時期情報とが少なくとも記憶される文献 情報記憶部と、文献情報取得部と、技術用語抽出部と、時期情報取得部と、技術動 向情報作成部と、出力部とを用いて実行される技術動向情報作成方法であって、 前記文献情報取得部が、前記文献情報記憶部で記憶されて!、る複数の文献情報の うち、少なくとも一部の文献情報の全部または一部を取得する文献情報取得ステップ と、 [33] A document information storage unit that stores at least a plurality of document information that is information indicating at least part of the technical document, and time information that is information indicating a time related to the technical document indicated by the document information, and a document A technical trend information creation method that is executed using an information acquisition unit, a technical term extraction unit, a time information acquisition unit, a technical trend information generation unit, and an output unit, wherein the document information acquisition unit includes: A document information acquisition step of acquiring all or a part of at least a part of the document information among the plurality of document information stored in the document information storage unit;
前記技術用語抽出部が、前記文献情報取得ステップで取得した文献情報の全部ま たは一部力 技術用語を抽出する技術用語抽出ステップと、  A technical term extraction step in which the technical term extraction unit extracts all or part of the technical information of the literature information acquired in the literature information acquisition step;
前記時期情報取得部が、前記技術用語抽出ステップで抽出した技術用語が含まれ る文献情報に対応する時期情報を前記文献情報記憶部から取得する時期情報取得 ステップと、  A timing information acquisition step in which the timing information acquisition unit acquires timing information corresponding to the document information including the technical terms extracted in the technical term extraction step;
前記技術動向情報作成部が、前記技術用語抽出ステップで抽出した技術用語と、 前記時期情報取得ステップで取得した時期情報とを用いて、技術用語と当該技術用 語を含む技術文献に関する時期とを対応付ける情報である技術動向情報を作成す る技術動向情報作成ステップと、 Technical terms extracted by the technical trend information creation unit in the technical term extraction step, Using the time information acquired in the time information acquisition step, a technical trend information creation step for creating technical trend information, which is information for associating a technical term with a time concerning a technical document including the technical term,
前記出力部が、前記技術動向情報作成ステップで作成した技術動向情報を出力す る出力ステップと、を備えた技術動向情報作成方法。  An output step in which the output unit outputs the technology trend information created in the technology trend information creation step.
[34] 検索用語受付部と、検索部とをさらに用いて実行される技術動向情報作成方法であ つて、  [34] A technical trend information generation method that is executed by further using a search term reception unit and a search unit.
前記検索用語受付部が、検索用語を受け付ける検索用語受付ステップと、 前記検索部が、前記検索用語受付ステップで受け付けた検索用語を用いて、前記 文献情報記憶部を検索する検索ステップと、をさらに備え、  A search term receiving step in which the search term accepting unit accepts a search term; and a search step in which the search unit searches the literature information storage unit using the search term accepted in the search term accepting step. Prepared,
前記文献情報取得ステップでは、検索された文献情報の全部または一部を取得する 、請求項 33記載の技術動向情報作成方法。  34. The technical trend information creation method according to claim 33, wherein in the document information acquisition step, all or a part of the retrieved document information is acquired.
[35] コンピュータに、 [35] On the computer,
技術文献の少なくとも一部を示す情報である複数の文献情報と、当該文献情報の示 す技術文献に関する時期を示す情報である時期情報とが少なくとも記憶される文献 情報記憶部で記憶されている複数の文献情報のうち、少なくとも一部の文献情報の 全部または一部を取得する文献情報取得ステップと、  A plurality of document information that is information indicating at least a part of the technical document and a plurality of document information stored in the information storage unit that stores at least time information that is information indicating the time related to the technical document indicated by the document information. A document information acquisition step of acquiring all or a part of at least a part of the document information,
前記文献情報取得ステップで取得した文献情報の全部または一部力 技術用語を 抽出する技術用語抽出ステップと、  A technical term extraction step for extracting all or part of the technical information of the literature information acquired in the literature information acquisition step;
前記技術用語抽出ステップで抽出した技術用語が含まれる文献情報に対応する時 期情報を前記文献情報記憶部から取得する時期情報取得ステップと、  A time information acquisition step of acquiring time information corresponding to the document information including the technical terms extracted in the technical term extraction step from the document information storage unit;
前記技術用語抽出ステップで抽出した技術用語と、前記時期情報取得ステップで取 得した時期情報とを用いて、技術用語と当該技術用語を含む技術文献に関する時 期とを対応付ける情報である技術動向情報を作成する技術動向情報作成ステップと 前記技術動向情報作成ステップで作成した技術動向情報を出力する出力ステップと 、を実行させるためのプログラム。  Technical trend information, which is information for associating technical terms with the technical literature including the technical terms, using the technical terms extracted in the technical term extraction step and the time information acquired in the time information acquisition step. A program for executing a technology trend information creating step for creating a technology and an output step for outputting the technology trend information created in the technology trend information creating step.
[36] コンピュータに、 検索用語を受け付ける検索用語受付ステップと、 [36] On the computer, A search term acceptance step for accepting a search term;
前記検索用語受付ステップで受け付けた検索用語を用いて、前記文献情報記憶部 を検索する検索ステップと、をさらに実行させ、 A search step of searching the document information storage unit using the search term received in the search term reception step, and
前記文献情報取得ステップでは、検索された文献情報の全部または一部を取得するIn the document information acquisition step, all or part of the retrieved document information is acquired.
、請求項 35記載のプログラム。 36. The program according to claim 35.
PCT/JP2007/054257 2006-03-10 2007-03-06 Related term acquiring device, related term acquiring method, technology trend information creating device, technology trend information creating method, and program WO2007105530A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2008505065A JP5078164B2 (en) 2006-03-10 2007-03-06 Related term acquisition device, related term acquisition method, and program

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2006-065052 2006-03-10
JP2006065052 2006-03-10
JP2006-225145 2006-08-22
JP2006225145 2006-08-22

Publications (1)

Publication Number Publication Date
WO2007105530A1 true WO2007105530A1 (en) 2007-09-20

Family

ID=38509365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/054257 WO2007105530A1 (en) 2006-03-10 2007-03-06 Related term acquiring device, related term acquiring method, technology trend information creating device, technology trend information creating method, and program

Country Status (2)

Country Link
JP (2) JP5078164B2 (en)
WO (1) WO2007105530A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009230327A (en) * 2008-03-21 2009-10-08 Fujitsu Ltd Document search method, document search device and recording medium recording document search program
JP2013073619A (en) * 2011-09-26 2013-04-22 Korea Institute Of Science & Technology Infomation Method, device and system for providing technology transition using technology life cycle graph and technology transition information database construction device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955488B (en) * 2014-04-15 2018-07-06 深圳如果技术有限公司 A kind of method of circuit of shared electronic map and shared end, receiving terminal
JP7092836B2 (en) * 2020-09-15 2022-06-28 楽天グループ株式会社 Proposal device for revision of patent classification, proposal method for revision of patent classification, and program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0696136A (en) * 1991-01-25 1994-04-08 Teremateiiku Kokusai Kenkyusho:Kk Review processor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0696136A (en) * 1991-01-25 1994-04-08 Teremateiiku Kokusai Kenkyusho:Kk Review processor

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NANBA H. ET AL.: "Gakujutsu Ronbun Database Kara no Kenkyu Doko Joho no Chushutsu to Kashika", THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING DAI 12 KAI NENJI TAIKAI WORKSHOP 'GENGO SHORYI TO JOHO KASHIKA NO SETTEN' RONBUNSHU, 13 June 2006 (2006-06-13), XP003017095, Retrieved from the Internet <URL:http://www.nlp.its.hiroshima-cu.ac.jp/~nanba/pdf/nlp2006workshop.pdf> *
NANBA H. ET AL.: "Ronbunkan no Sansho Joho o Koryo Shita Survey Ronbun Sakusei Shien System no Kaihatsu", JOURNAL OF NATURAL LANGUAGE PROCESSING, THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING, vol. 6, no. 5, 10 July 1999 (1999-07-10), pages 43 - 62, XP003017094 *
NANBA H.: "Ronbunkan no In'yo Joho o Riyo Shita Kanren Yogo no Jido Shushu", THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING DAI 11 KAI NENJI TAIKAI, 14 March 2005 (2005-03-14) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009230327A (en) * 2008-03-21 2009-10-08 Fujitsu Ltd Document search method, document search device and recording medium recording document search program
JP2013073619A (en) * 2011-09-26 2013-04-22 Korea Institute Of Science & Technology Infomation Method, device and system for providing technology transition using technology life cycle graph and technology transition information database construction device

Also Published As

Publication number Publication date
JP5463494B2 (en) 2014-04-09
JP5078164B2 (en) 2012-11-21
JP2012226768A (en) 2012-11-15
JPWO2007105530A1 (en) 2009-07-30

Similar Documents

Publication Publication Date Title
Kuzman et al. Automatic genre identification: a survey
CN100424695C (en) Document processing apparatus for searching documents control method therefor,
Trappey et al. An R&D knowledge management method for patent document summarization
JP2005526317A (en) Method and system for automatically searching a concept hierarchy from a document corpus
US20140101542A1 (en) Automated data visualization about selected text
Afzal et al. Rule based Autonomous Citation Mining with TIERL.
Azmeh et al. Automatic web service tagging using machine learning and wordnet synsets
JP2885487B2 (en) Document information retrieval device
JP5463494B2 (en) Technology trend information generator
Rogushina Use of Semantic Similarity Estimates for Unstructured Data Analysis.
Sarwar et al. The key factors and their influence in authorship attribution.
US20080215597A1 (en) Information processing apparatus, information processing system, and program
Dalton et al. Local and global query expansion for hierarchical complex topics
JP2004355550A (en) Natural sentence retrieval device, and its method and program
Stanković et al. Distant Reading in Digital Humanities: Case Study on the Serbian Part of the ELTeC Collection
CN110688559A (en) Retrieval method and device
US10872107B2 (en) Document search system for specialized technical documents
Jain et al. Organizing query completions for web search
JP4813312B2 (en) Electronic document search method, electronic document search apparatus and program
JP5019315B2 (en) Information processing apparatus, information processing method, and program
JP6565565B2 (en) Information processing apparatus, name determination method, and name determination program
Thanadechteemapat et al. Thai word segmentation for visualization of thai web sites
Abdou et al. Unsupervised automatic keywords and keyphrases extractor for web documents
JP2005284776A (en) Text mining apparatus and text analysis method
Ojokoh et al. Generating recommendation status of electronic products from online reviews

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07715225

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2008505065

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07715225

Country of ref document: EP

Kind code of ref document: A1