JP2011159078A

JP2011159078A - Information processing apparatus, determination program and determination method

Info

Publication number: JP2011159078A
Application number: JP2010019649A
Authority: JP
Inventors: Hiroshi Miyahara; 寛宮原; Kozo Nagano; 浩三長野; Yoshitaka Kitagawa; 義隆北川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-01-29
Filing date: 2010-01-29
Publication date: 2011-08-18
Anticipated expiration: 2030-01-29
Also published as: JP5392120B2

Abstract

PROBLEM TO BE SOLVED: To automatically determine whether a reported side effect of a medicine subject to determination is known. SOLUTION: A similar medicine recognition part 105 recognizes a similar medicine to a medicine subject to determination, and a side effect keyword extraction part 107 extracts a side effect phrase subject to determination from a safety information report document 205. A side effect determination/learning part 108 instructs a keyword similarity evaluation part 109 to evaluate the combination of the side effect phrase subject to determination with a phrase indicating a side effect in an attached document 202 to the similar medicine, and compares the evaluation with a threshold to determine whether a side effect indicated by the side effect phrase subject to determination is known to the similar medicine. The keyword similarity evaluation part 109 tabulates evaluations of similarity between character substrings divided from the two phrases to evaluate a similarity between the two phrases. A character substring similarity evaluation part 110 gives an evaluation higher than the sum of an evaluation for matching character substrings of a first length and an evaluation for matching character substrings of a second length when character substrings of a third length equal to the first and second lengths match. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、医薬品の副作用が既知か未知かを判定する技術に関する。 The present invention relates to a technique for determining whether a side effect of a pharmaceutical product is known or unknown.

日本の薬事法によれば、製薬会社は、自社が製造・販売した医薬品（以下「自社薬」ともいう）に関して市販後に新たに副作用が発見された場合は、当該副作用について厚生労働大臣に報告する義務がある。具体的には、製薬会社は、例えば病院の医師等から副作用（あるいは副作用の疑いのある症状）の報告を受けると、報告された副作用が既知か未知かを判断し、判断結果とともに当該副作用を厚生労働大臣に報告する。 According to the Japanese Pharmaceutical Affairs Law, pharmaceutical companies report to the Minister of Health, Labor and Welfare if any new side effects are discovered after marketing for drugs manufactured and sold by the company (hereinafter also referred to as “in-house drugs”). There is an obligation. Specifically, when a pharmaceutical company receives a report of a side effect (or a symptom suspected of having a side effect) from a doctor in a hospital, for example, the pharmaceutical company determines whether the reported side effect is known or unknown. Report to the Minister of Health, Labor and Welfare.

ところが、製薬会社が医師等から副作用の報告を受けたとき、報告された副作用が既知か未知かを判断することは難しい。そのため、製薬会社では、人手により、自社薬の類薬（すなわち同種同効薬）の添付文書を参照し、医師等から報告された副作用が類薬では既知か未知かを調べることが行われる。 However, when a pharmaceutical company receives a report of a side effect from a doctor or the like, it is difficult to determine whether the reported side effect is known or unknown. Therefore, a pharmaceutical company manually checks whether a side effect reported by a doctor or the like is known or unknown in a similar drug by referring to a package insert of the similar drug (that is, the same-type synergistic drug).

なお、添付文書とは、製薬会社が薬事法に基づいて作成し、個々の医薬品に添付する文書のことである。添付文書は、例えば、「薬効分類名」、「販売名コード」、「効能又は効果」、「副作用」、「有効成分に関する理化学的知見」などの項目を含む。 The package insert is a document created by a pharmaceutical company based on the Pharmaceutical Affairs Law and attached to each drug. The package insert includes, for example, items such as “medicinal effect classification name”, “market name code”, “efficacy or effect”, “side effect”, and “physical and chemical knowledge regarding active ingredients”.

製薬会社では、上記のように類薬の添付文書を調べた結果を１つの根拠として、「医師等から報告された副作用は、既知の副作用か、それとも未知の副作用か」ということが判断される。 Based on the results of examining the package insert of a similar drug as described above, a pharmaceutical company determines that “a side effect reported by a doctor or the like is a known side effect or an unknown side effect”. .

仮に、製薬会社から厚生労働大臣への報告において、未知の副作用の報告漏れや報告の遅れなどがあると、国民の健康に多大な影響を及ぼす可能性がある。したがって、臨床現場から製薬会社に報告された副作用が既知の副作用であるか否かを、製薬会社が迅速かつ正確に判断することができれば、社会全体にとっても有益である。 If a pharmaceutical company reports to the Minister of Health, Labor and Welfare, missing reports of unknown side effects or delays in reporting could have a significant impact on the health of the people. Therefore, if the pharmaceutical company can quickly and accurately determine whether or not the side effect reported from the clinical site to the pharmaceutical company is a known side effect, it is beneficial for the whole society.

しかし、添付文書の記述は、形式や表記が必ずしも厳密に統一されてはいないので、類薬の添付文書に対して単純な文字列検索を行うだけでは、正確な判断結果を得ることが難しい。そのため、現状では、「臨床現場から製薬会社に報告された副作用が、既知の副作用か未知の副作用か」という判定は、人力に頼る部分が大きい。 However, since the format and notation of the description of the attached document are not necessarily strictly unified, it is difficult to obtain an accurate determination result only by performing a simple character string search for the attached document of the similar medicine. Therefore, at present, the determination that “the side effect reported from the clinical site to the pharmaceutical company is a known side effect or an unknown side effect” largely depends on human power.

そこで本発明は、ある医薬品について報告された副作用が既知か未知かを自動的に判定することを目的とする。 Therefore, an object of the present invention is to automatically determine whether a side effect reported for a certain pharmaceutical is known or unknown.

一態様による情報処理装置は、特定手段、報告文書取得手段、類薬認識手段、部分文字列類似度評価手段、語句類似度評価手段、副作用語句抽出手段、比較対象集合取得手段、判定手段及び出力手段を備える。 An information processing apparatus according to one aspect includes a specifying unit, a report document acquiring unit, an analog recognition unit, a partial character string similarity evaluating unit, a phrase similarity evaluating unit, a side effect phrase extracting unit, a comparison target set acquiring unit, a determining unit, and an output Means.

前記特定手段は、医薬品を特定するための情報を受け付け、前記情報が示す前記医薬品を判定対象薬として特定する。前記報告文書取得手段は、前記判定対象薬の副作用について記載した報告文書を取得する。前記類薬認識手段は、医薬品を一意に識別する識別情報と該医薬品の類薬とを関連付ける類薬学習結果情報を格納手段から読み出すことにより、あるいは、複数の医薬品の各々について、当該医薬品の前記識別情報と当該医薬品の副作用と当該医薬品の効能又は効果を含む添付文書を前記格納手段から読み出すことにより、複数の他の医薬品の中で前記判定対象薬に類似する類薬を認識する。 The specifying unit receives information for specifying a drug, and specifies the drug indicated by the information as a determination target drug. The report document acquisition means acquires a report document describing the side effects of the determination target drug. The similar drug recognition means reads out the similar drug learning result information associating the identification information uniquely identifying the drug with the similar drug of the drug from the storage means, or for each of a plurality of drugs, the drug of the drug By reading the package insert including the identification information, the side effect of the medicinal product and the efficacy or effect of the medicinal product from the storage unit, the similar drug similar to the determination target drug is recognized among a plurality of other medicinal products.

また、前記部分文字列類似度評価手段は、語句内に含まれる部分文字列同士の類似度を評価するのに、第１の長さと第２の長さを足した第３の長さの部分文字列同士が一致する場合には前記第１の長さの部分文字列同士が一致する場合の評価と前記第２の長さの部分文字列同士が一致する場合の評価を足した評価以上の高い評価を与える。そして、語句類似度評価手段は、２つの語句の各々をそれぞれ分割して得られる部分文字列同士の類似度を前記文字列類似度評価手段に評価させ、前記文字列類似度評価手段による評価の結果を集計することで、前記２つの語句の各々を１つ以上の部分文字列に分割する分割パターンの組み合わせを評価し、前記２つの語句それぞれの分割パターンの複数通りの組み合わせについての評価を用いて前記２つの語句同士の類似度を評価する。 The partial character string similarity evaluation means evaluates the similarity between the partial character strings included in the phrase by adding a first length and a second length to a third length portion. If the character strings match, the evaluation is equal to or higher than the evaluation when the partial character strings of the first length match and the evaluation when the partial character strings of the second length match. Give a high rating. The phrase similarity evaluation means causes the character string similarity evaluation means to evaluate the similarity between the partial character strings obtained by dividing each of the two phrases, and the evaluation by the character string similarity evaluation means By summing up the results, a combination of division patterns that divide each of the two phrases into one or more partial character strings is evaluated, and evaluations for a plurality of combinations of the division patterns of the two phrases are used. Then, the similarity between the two words is evaluated.

また、前記副作用語句抽出手段は、前記報告文書から、前記判定対象薬の前記副作用を示す語句を、判定対象副作用語句として抽出する。そして、前記比較対象集合取得手段は、前記類薬認識手段により前記類薬として認識された医薬品の添付文書を前記格納手段から読み出して、該添付文書における副作用の記載部分から、語句抽出処理により語句の集合を抽出することによって、あるいは、前記格納手段から、前記類薬認識手段により前記類薬として認識された前記医薬品の添付文書における副作用の記載部分からの語句抽出処理により得られた語句の集合を前記類薬として認識された前記医薬品の前記識別情報と関連付ける副作用学習結果情報を読み出すことによって、前記類薬として認識された前記医薬品の前記添付文書における前記副作用の前記記載部分に含まれる語句の集合を、比較対象語句集合として取得する。 Further, the side effect phrase extracting unit extracts a phrase indicating the side effect of the determination target drug from the report document as a determination target side effect phrase. The comparison target set acquisition unit reads out a package insert of the medicine recognized as the analog by the analog drug recognition unit from the storage unit, and uses a phrase extraction process to extract a phrase from a side effect description part in the package insert. A set of phrases obtained by extracting a set of phrases or by a phrase extraction process from a side effect description part in a package insert of the medicine recognized as the similar drug by the similar drug recognition means from the storage means By reading side effect learning result information that associates the identification information of the drug recognized as the similar drug with the identification information of the drug, the word included in the description part of the side effect in the package insert of the drug recognized as the similar drug The set is acquired as a comparison target phrase set.

前記判定手段は、前記類薬の少なくとも一部について、それぞれ、当該類薬に関して取得された前記比較対象語句集合に含まれる語句と、前記判定対象副作用語句との組み合わせを、前記語句類似度評価手段に評価させ、評価の結果と閾値とを用いて、前記判定対象副作用語句が示す前記副作用が当該類薬において既知の副作用か否かを判定する。そして、前記出力手段は、前記判定手段による判定結果を出力する。 The determination means includes, for at least a part of the analog, a combination of a word / phrase included in the comparison target phrase set acquired for the analog and the determination target side-effect word / phrase, and the phrase similarity evaluation means Then, using the evaluation result and the threshold value, it is determined whether or not the side effect indicated by the determination target side effect phrase is a known side effect in the related drug. And the said output means outputs the determination result by the said determination means.

上記の情報処理装置によれば、判定対象薬の副作用を記載した報告文書から、当該副作用が判定対象薬の類薬において既知の副作用か否かが自動的に判定され、判定結果が出力される。したがって、上記の情報処理装置は、従来は製薬会社において人手によって手間をかけて行われていた判断を自動化することができる。 According to the information processing apparatus, it is automatically determined whether or not the side effect is a known side effect in the analog of the determination target drug from the report document describing the side effect of the determination target drug, and the determination result is output. . Therefore, the information processing apparatus described above can automate a determination that has been conventionally performed manually in a pharmaceutical company.

判定装置の構成図である。It is a block diagram of a determination apparatus. コンピュータの構成図である。It is a block diagram of a computer. 副作用判定・学習処理のフローチャートである。It is a flowchart of a side effect determination / learning process. 学習結果テーブルの例を説明する図である。It is a figure explaining the example of a learning result table. 副作用判定結果画面の例を説明する図である。It is a figure explaining the example of a side effect determination result screen. 副作用類似度算出処理のフローチャートである。It is a flowchart of a side effect similarity calculation process. 点数計算処理のフローチャートである。It is a flowchart of a score calculation process. 判定装置において処理に利用される定数値を説明する図である。It is a figure explaining the constant value utilized for a process in the determination apparatus. 点数計算処理の具体例を模式的に説明する図である。It is a figure which illustrates the specific example of a score calculation process typically. 点数正規化処理のフローチャートである。It is a flowchart of a score normalization process. 通常運用の開始前に判定装置が行う前処理のフローチャートである。It is a flowchart of the pre-processing which a determination apparatus performs before the start of normal operation. 類薬学習処理のフローチャートである。It is a flowchart of an analog medicine learning process. 効能・効果類似度算出処理のフローチャートである。It is a flowchart of an effect and effect similarity calculation process. 追加・更新処理のフローチャートである。It is a flowchart of an addition / update process.

以下、実施形態について、図面を参照しながら次の順序で詳細に説明していく。
まず、図１を参照して、ある副作用が既知か未知かを判定する判定装置の構成を説明し、図２を参照して当該判定装置を実現するハードウェアの具体例を説明する。その後、副作用の報告を臨床現場などから受けた場合に判定装置が行う処理について、任意の医薬品について類薬が学習済みであるという仮定のもとで、図３〜１０を参照して説明する。なお、図１〜１４に示す実施形態においては、判定装置が前処理を行うことで、上記仮定を成立させることが可能である。そこで、続いて図１１〜１４を参照して、前処理による類薬の学習について説明する。最後に、様々な変形例についても説明する。 Hereinafter, embodiments will be described in detail in the following order with reference to the drawings.
First, the configuration of a determination apparatus that determines whether a certain side effect is known or unknown will be described with reference to FIG. 1, and a specific example of hardware that realizes the determination apparatus will be described with reference to FIG. Thereafter, processing performed by the determination apparatus when a report of a side effect is received from a clinical site or the like will be described with reference to FIGS. In the embodiment shown in FIGS. 1 to 14, the above assumption can be established by the preprocessing performed by the determination device. Then, with reference to FIGS. 11-14, the learning of the analog by pre-processing is demonstrated. Finally, various modifications will be described.

さて、図１は判定装置の構成図である。図１の判定装置１００は、各種情報を格納する格納部１０１と、副作用に関する処理を行う副作用処理部１０２と、類薬に関する処理を行う類薬処理部１０３を有する。 FIG. 1 is a configuration diagram of the determination apparatus. The determination apparatus 100 in FIG. 1 includes a storage unit 101 that stores various types of information, a side effect processing unit 102 that performs processing related to side effects, and an analog medicine processing unit 103 that performs processing related to analog drugs.

副作用処理部１０２は、判定対象薬指定部１０４と類薬認識部１０５と報告文書取得部１０６と副作用キーワード抽出部１０７と副作用判定・学習部１０８を有する。また、副作用処理部１０２はキーワード類似度評価部１０９を含むが、本実施形態においては、キーワード類似度評価部１０９は類薬処理部１０３にも共有されている。そして、類薬処理部１０３は、副作用処理部１０２と共有しているキーワード類似度評価部１０９のほかに、さらに効能・効果キーワード抽出部１１１と類薬判定・学習部１１２を有する。 The side effect processing unit 102 includes a determination target drug designating unit 104, a similar drug recognition unit 105, a report document acquisition unit 106, a side effect keyword extraction unit 107, and a side effect determination / learning unit 108. Further, the side effect processing unit 102 includes a keyword similarity evaluation unit 109, but in this embodiment, the keyword similarity evaluation unit 109 is also shared with the analog medicine processing unit 103. In addition to the keyword similarity evaluation unit 109 shared with the side effect processing unit 102, the similar drug processing unit 103 further includes an efficacy / effect keyword extraction unit 111 and an analog determination / learning unit 112.

また、キーワード類似度評価部１０９は部分文字列類似度評価部１１０を含み、判定装置１００はさらに前処理制御部１１３を備える。前処理制御部１１３は、副作用処理部１０２内の副作用キーワード抽出部１０７と類薬処理部１０３内の類薬判定・学習部１１２に前処理を行わせるための制御を行う。 Further, the keyword similarity evaluation unit 109 includes a partial character string similarity evaluation unit 110, and the determination apparatus 100 further includes a preprocessing control unit 113. The preprocessing control unit 113 performs control for causing the side effect keyword extraction unit 107 in the side effect processing unit 102 and the similar drug determination / learning unit 112 in the similar drug processing unit 103 to perform preprocessing.

また、図１には、格納部１０１が格納する情報と判定装置１００へ入力される情報についても図示されている。上記の副作用処理部１０２と類薬処理部１０３内の各部の動作については、各種情報について説明した後で説明する。 FIG. 1 also illustrates information stored in the storage unit 101 and information input to the determination apparatus 100. The operation of each unit in the side effect processing unit 102 and the similar drug processing unit 103 will be described after various information has been described.

図１に示すとおり、格納部１０１は、添付文書群２０１を格納しており、添付文書群２０１は、複数の医薬品それぞれについての添付文書２０２を含む。上記のとおり、添付文書は、例えば、「薬効分類名」、「販売名コード」、「効能又は効果」、「副作用」、「有効成分に関する理化学的知見」などのセクションを含む文書である。 As illustrated in FIG. 1, the storage unit 101 stores an attached document group 201, and the attached document group 201 includes an attached document 202 for each of a plurality of medicines. As described above, the package insert is, for example, a document including sections such as “medicinal effect classification name”, “market name code”, “efficacy or effect”, “side effects”, “physicochemical knowledge about active ingredients”, and the like.

「薬効分類名」としては、例えば、「解熱鎮痛消炎剤」や「血圧降下剤」など、ある程度規格化された名称が使われる。１つの医薬品が複数の用途に使われる場合、複数の薬効分類名が記載されていることもある。 As the “medicinal effect classification name”, for example, names that are standardized to some extent, such as “antipyretic analgesic / anti-inflammatory agent” and “blood pressure lowering agent”, are used. When one pharmaceutical product is used for a plurality of uses, a plurality of drug effect classification names may be described.

また、「販売名コード」は、各医薬品を一意に識別する識別情報である。本実施形態では、販売名コードが医薬品のIDentification（ＩＤ）として用いられる。
「効能又は効果」と「副作用」のセクションは、記載の仕方にある程度の自由度があり、自然言語文で説明される場合もあるし、リストやテーブルの形式で記載されている場合もある。 The “sales name code” is identification information for uniquely identifying each medicine. In the present embodiment, the sales name code is used as an IDentification (ID) of a medicine.
“Effectiveness or effect” and “side effect” sections have a certain degree of freedom in description, and may be described in natural language sentences, or may be described in the form of lists or tables.

例えば、「効能又は効果」セクションは、「慢性関節リウマチ、リウマチ熱、変形性関節症、……」のように、当該医薬品が効果をあらわす疾病ないし症状を読点で区切って列挙する形で記載されることもある。また、副作用は、「喘息発作を誘発することがある。」のように自然言語文で記載されることもあるし、テーブル形式で記載されることもある。例えば、何に関する副作用かを示す見出し列と、副作用の発生頻度を示す列と、具体的な副作用を記載する列を持ったテーブルにおいて、「血液」という見出しに「０．１％未満」という頻度と「貧血」という具体的副作用が対応づけられていてもよい。 For example, the “Efficacy or Effect” section is described in the form of a list of diseases or symptoms that the drug is effective, such as “rheumatoid arthritis, rheumatic fever, osteoarthritis, etc.”, separated by readings. Sometimes. Further, the side effect may be described in a natural language sentence such as “may cause an asthma attack” or may be described in a table format. For example, in a table having a heading column indicating what is a side effect, a column indicating the frequency of occurrence of the side effect, and a column describing a specific side effect, the frequency of “less than 0.1%” in the heading “blood” And a specific side effect of “anemia” may be associated.

「有効成分に関する理化学的知見」セクションには、「アスピリン」のような一般名、「2-Acetoxybenzoic acid」のような化学名、分子式、構造式などが含まれてもよい。また、添付文書によっては、「一般名称」や「基準名」などの別の見出しのセクションに有効成分の一般名が記載されていることもある。 The “physicochemical knowledge about active ingredient” section may include a generic name such as “aspirin”, a chemical name such as “2-Acetoxybenzoic acid”, a molecular formula, a structural formula, and the like. Also, depending on the package insert, the general name of the active ingredient may be described in a section with another heading such as “general name” or “reference name”.

なお、添付文書の具体例は、例えば独立行政法人医薬品医療機器総合機構の「医薬品医療機器情報提供ホームページ」（http://www.info.pmda.go.jp/）において閲覧することができる。添付文書には上記に例示した以外の様々な項目も含まれるが、添付文書についてのこれ以上の詳細な説明は割愛する。 A specific example of the package insert can be viewed, for example, on the “medical device information provision website” (http://www.info.pmda.go.jp/) of the Pharmaceuticals and Medical Devices Agency. The attached document includes various items other than those exemplified above, but a detailed description of the attached document is omitted.

さて、格納部１０１は、以上のような複数の添付文書２０２を含む添付文書群２０１に加えて、同義語辞書２０３と学習結果テーブル２０４も格納する。同義語辞書２０３は、同義語又は類義語の対を登録した辞書であり、学習結果テーブル２０４は、副作用処理部１０２と類薬処理部１０３による学習の結果を保持するテーブルである。 The storage unit 101 stores a synonym dictionary 203 and a learning result table 204 in addition to the attached document group 201 including the plurality of attached documents 202 as described above. The synonym dictionary 203 is a dictionary in which pairs of synonyms or synonyms are registered, and the learning result table 204 is a table that holds the results of learning by the side effect processing unit 102 and the analog medicine processing unit 103.

図１には、同義語辞書２０３において次の２つのエントリがテーブル形式で例示されているが、同義語辞書２０３の内容と形式は実施形態に応じて任意である。
・「ＡＳＴ」（ASpartate aminoTransferase）と「ＧＯＴ」（Glutamic Oxaloacetic Transaminase）を対にしたエントリ
・「全身」と「体」を対にしたエントリ FIG. 1 illustrates the following two entries in the synonym dictionary 203 in a table format, but the contents and format of the synonym dictionary 203 are arbitrary depending on the embodiment.
・ Entry with “AST” (ASpartate aminoTransferase) and “GOT” (Glutamic Oxaloacetic Transaminase) as a pair ・ Entry with “whole body” and “body” as a pair

また、同義語辞書２０３は、予め作成されて用意されていてもよいし、初期状態では空でもよい。本実施形態では同義語辞書２０３の学習が行われ、時間の経過とともに同義語辞書２０３のエントリが増加するが、同義語辞書２０３の内容が固定されている実施形態（すなわち同義語辞書２０３の学習を行わない実施形態）も可能である。 The synonym dictionary 203 may be prepared and prepared in advance, or may be empty in the initial state. In this embodiment, learning of the synonym dictionary 203 is performed, and the number of entries in the synonym dictionary 203 increases as time passes. However, an embodiment in which the contents of the synonym dictionary 203 are fixed (that is, learning of the synonym dictionary 203 is performed). Embodiments that do not perform are also possible.

そして、学習結果テーブル２０４は、詳しくは図４とともに後述するが、各エントリが個々の医薬品に対応するテーブルである。また、各エントリには、図１に示すとおり「ＩＤ」、「効能・効果キーワード群」、「類薬リスト」、「副作用キーワード群」、「既知副作用リスト」という各フィールドが含まれる。 The learning result table 204 is a table in which each entry corresponds to an individual medicine, as will be described in detail later with reference to FIG. Each entry includes fields of “ID”, “Efficacy / effect keyword group”, “Similar drug list”, “Side effect keyword group”, and “Known side effect list” as shown in FIG.

なお、学習結果テーブル２０４において、ＩＤには上記のように販売名コードが使われる。また、効能・効果キーワード群は、添付文書２０２の「効能又は効果」セクションから予め抽出されたキーワード群である。類薬リストは、類薬のＩＤのリストであり、図１１〜１４とともに後述する処理により予め作成されているものとする。副作用キーワード群は、添付文書２０２の「副作用」セクションから予め抽出されたキーワード群である。既知副作用リストは、類薬において既知の副作用であると判定された副作用を学習するためのリストである。 In the learning result table 204, the sales name code is used as the ID as described above. The effect / effect keyword group is a keyword group extracted in advance from the “effect or effect” section of the attached document 202. The similar drug list is a list of similar drug IDs, and is prepared in advance by the processing described later with reference to FIGS. The side effect keyword group is a keyword group extracted in advance from the “side effect” section of the attached document 202. The known side effect list is a list for learning side effects determined to be known side effects in the similar drugs.

なお、本実施形態では、自然言語文やリストやテーブルなどから抽出される語句を「キーワード」と称する。キーワードは、例えば、単名詞でもよいし、２つ以上の名詞が連なった複合名詞でもよいし、形容詞と１つ以上の名詞の連なりを含む名詞句でもよい。 In the present embodiment, a phrase extracted from a natural language sentence, a list, a table, or the like is referred to as a “keyword”. The keyword may be, for example, a single noun, a compound noun in which two or more nouns are connected, or a noun phrase including a combination of an adjective and one or more nouns.

また、図１に示すように、判定装置１００には、安全性情報報告文書２０５と判定対象薬ＩＤ２０６が入力される。
安全性情報報告文書２０５は、副作用について記載した報告文書である。より具体的には、安全性情報報告文書２０５は、医薬品の販売後に臨床現場から製薬会社へともたらされる安全性情報に関する報告文書である。なお、本明細書において「安全性情報」とは、当該製薬会社の医薬品の投与に起因する（あるいはその疑いのある）副作用に関する情報である。 As shown in FIG. 1, a safety information report document 205 and a determination target drug ID 206 are input to the determination apparatus 100.
The safety information report document 205 is a report document describing side effects. More specifically, the safety information report document 205 is a report document related to safety information that is brought from the clinical site to the pharmaceutical company after the sale of the drug. In the present specification, “safety information” refers to information regarding side effects caused by (or suspected of) the administration of pharmaceuticals by the pharmaceutical company.

なお、添付文書２０２と安全性情報報告文書２０５はともに副作用に関する情報を含むが、両者はまったく別の文書である。すなわち、添付文書２０２は、製薬会社が薬事法にしたがって作成する文書であるのに対し、安全性情報報告文書２０５は、臨床現場の医師等が、製薬会社への報告のために作成する報告文書である。 Note that both the attached document 202 and the safety information report document 205 include information on side effects, but they are completely different documents. That is, the attached document 202 is a document created by a pharmaceutical company in accordance with the Pharmaceutical Affairs Law, whereas the safety information report document 205 is a report document created by a clinical doctor or the like for reporting to the pharmaceutical company. It is.

また、安全性情報報告文書２０５の形式は任意である。例えば、安全性情報報告文書２０５には、副作用に関する自然言語文による説明が含まれていてもよいし、副作用を列挙するリストやテーブルが含まれていてもよい。 The format of the safety information report document 205 is arbitrary. For example, the safety information report document 205 may include a description in a natural language sentence regarding side effects, or may include a list or a table that lists side effects.

臨床現場から安全性情報報告文書２０５がもたらされると、製薬会社は、安全性情報報告文書２０５にて報告されている副作用が既知のものか未知のものかを厚生労働大臣に報告する義務がある。この義務は、上記のとおり薬事法で定められている。 When the safety information report document 205 is brought from the clinical site, the pharmaceutical company is obliged to report to the Minister of Health, Labor and Welfare whether the side effects reported in the safety information report document 205 are known or unknown. . This obligation is stipulated in the Pharmaceutical Affairs Law as described above.

本実施形態の判定装置１００は、製薬会社から厚生労働大臣への迅速かつ正確な報告を支援するため、安全性情報報告文書２０５にて報告されている副作用が既知のものか未知のものかを自動的又は半自動的に判定する。なお、詳しくは後述するが、ここで「半自動的」というのは、本実施形態の判定装置１００が「既知の副作用と断定することはできないが、既知の可能性もある程度はある」といった中間的な判定を行い、最終的な判定をユーザ入力に委ねる場合を指す。 The determination apparatus 100 according to the present embodiment determines whether the side effects reported in the safety information report document 205 are known or unknown in order to support prompt and accurate reporting from the pharmaceutical company to the Minister of Health, Labor and Welfare. Judge automatically or semi-automatically. As will be described in detail later, the term “semi-automatic” is used here to mean that the determination apparatus 100 according to the present embodiment is an intermediate such that “the side effect cannot be determined as a known side effect, but there is a certain degree of known possibility” This refers to a case in which a final determination is made and left to the user input.

具体的に、本実施形態では、安全性情報報告文書２０５にて報告されている安全性情報がどの医薬品に関するものであるかが、判定対象薬ＩＤ２０６により特定される。すなわち、判定対象薬ＩＤ２０６は、安全性情報報告文書２０５によって副作用が報告された医薬品のＩＤである。以下、判定対象薬ＩＤ２０６により特定される医薬品を「判定対象薬」という。 Specifically, in the present embodiment, the determination target drug ID 206 specifies which drug the safety information reported in the safety information report document 205 relates to. That is, the determination target drug ID 206 is an ID of a drug for which a side effect is reported by the safety information report document 205. Hereinafter, the pharmaceutical specified by the determination target drug ID 206 is referred to as “determination target drug”.

さて、続いて、以上説明したような、格納部１０１に格納される情報と判定装置１００に入力される情報に関して、副作用処理部１０２と類薬処理部１０３が行う処理について説明する。 Now, the processing performed by the side effect processing unit 102 and the similar medicine processing unit 103 regarding the information stored in the storage unit 101 and the information input to the determination apparatus 100 as described above will be described.

判定対象薬指定部１０４は、判定対象薬ＩＤ２０６の入力を受け取り、判定対象薬ＩＤ２０６を類薬認識部１０５と副作用判定・学習部１０８に出力する。
判定対象薬指定部１０４は、医薬品を特定するための情報を受け付け、受け付けた情報が示す医薬品を判定対象薬として特定する特定手段の一例である。なお、実施形態によっては、医薬品を特定するための情報は判定対象薬ＩＤ２０６でなくてもよく、医薬品の販売名、有効成分の一般名、有効成分の化学名などの項目を１つ以上組み合わせた検索条件でもよい。その場合、特定手段は、検索条件の入力を受け付け、検索条件に合致する医薬品を検索することで判定対象薬を特定する検索部により実現することができる。検索部は、検索に用いる項目を予め添付文書２０２から抽出してインデックス化しておいてもよい。 The determination target drug specifying unit 104 receives the input of the determination target drug ID 206 and outputs the determination target drug ID 206 to the analogy drug recognition unit 105 and the side effect determination / learning unit 108.
The determination target drug specifying unit 104 is an example of a specifying unit that receives information for specifying a drug and specifies the drug indicated by the received information as a determination target drug. Depending on the embodiment, the information for specifying the drug may not be the determination target drug ID 206, and may be a combination of one or more items such as a drug name, a general name of the active ingredient, and a chemical name of the active ingredient. It may be a search condition. In this case, the specifying unit can be realized by a search unit that receives an input of a search condition and specifies a determination target drug by searching for a medicine that matches the search condition. The search unit may extract and index items used for search from the attached document 202 in advance.

また、類薬認識部１０５は、入力された判定対象薬ＩＤ２０６を検索キーにして学習結果テーブル２０４を検索し、ＩＤが判定対象薬ＩＤ２０６と一致するエントリから類薬リストを読み出すことで、判定対象薬の類薬を認識する。そして、類薬認識部１０５は、読み出した類薬リストを副作用判定・学習部１０８に出力する。 Further, the similar medicine recognition unit 105 searches the learning result table 204 using the input determination target drug ID 206 as a search key, and reads out the similar drug list from an entry whose ID matches the determination target drug ID 206, thereby determining the determination target. Recognize the similar drugs. Then, the similar drug recognition unit 105 outputs the read similar drug list to the side effect determination / learning unit 108.

他方で報告文書取得部１０６は、判定対象薬についての安全性情報報告文書２０５を取得して副作用キーワード抽出部１０７に出力する。すると、副作用キーワード抽出部１０７は、安全性情報報告文書２０５から、判定対象薬の副作用を示すキーワードを１つ以上抽出し、抽出したキーワードを副作用判定・学習部１０８に出力する。副作用キーワード抽出部１０７は、判定対象薬の副作用を示す語句を判定対象副作用語句として抽出する副作用語句抽出手段の一例である。 On the other hand, the report document acquisition unit 106 acquires the safety information report document 205 for the determination target drug and outputs it to the side effect keyword extraction unit 107. Then, the side effect keyword extraction unit 107 extracts one or more keywords indicating the side effect of the determination target drug from the safety information report document 205 and outputs the extracted keywords to the side effect determination / learning unit 108. The side effect keyword extracting unit 107 is an example of a side effect phrase extracting unit that extracts a phrase indicating a side effect of a determination target drug as a determination target side effect phrase.

そして、副作用判定・学習部１０８は、副作用キーワード抽出部１０７が安全性情報報告文書２０５から抽出したキーワードのそれぞれについて、当該キーワードが示す副作用が既知か未知かを判定し、判定結果を学習する。 Then, the side effect determination / learning unit 108 determines, for each keyword extracted by the side effect keyword extraction unit 107 from the safety information report document 205, whether the side effect indicated by the keyword is known or unknown, and learns the determination result.

具体的には、副作用判定・学習部１０８は、類薬認識部１０５から入力された類薬リストに含まれる各ＩＤを検索キーにして学習結果テーブル２０４を検索し、各類薬の副作用キーワード群を得る。そして、副作用判定・学習部１０８は、副作用キーワード群内の各キーワードと、副作用キーワード抽出部１０７が安全性情報報告文書２０５から抽出したキーワードとの類似度をキーワード類似度評価部１０９に評価させる。 Specifically, the side effect determination / learning unit 108 searches the learning result table 204 using each ID included in the similar drug list input from the similar drug recognition unit 105 as a search key, and sets a side effect keyword group for each similar drug. Get. Then, the side effect determination / learning unit 108 causes the keyword similarity evaluation unit 109 to evaluate the similarity between each keyword in the side effect keyword group and the keyword extracted from the safety information report document 205 by the side effect keyword extraction unit 107.

キーワード類似度評価部１０９は、類似度を評価する対象として指定された２つのキーワードの各々をそれぞれ分割して得られる部分文字列同士の類似度を部分文字列類似度評価部１１０に評価させる。そして、キーワード類似度評価部１０９は、部分文字列類似度評価部１１０による評価の結果を集計することで、２つのキーワードの各々を１つ以上の部分文字列に分割する分割パターンの組み合わせを評価する。キーワード類似度評価部１０９は、２つのキーワードそれぞれの分割パターンの複数通りの組み合わせについての評価を用いて、２つのキーワード同士の類似度を評価する。 The keyword similarity evaluation unit 109 causes the partial character string similarity evaluation unit 110 to evaluate the similarity between partial character strings obtained by dividing each of the two keywords specified as targets for evaluating the similarity. Then, the keyword similarity evaluation unit 109 evaluates combinations of division patterns that divide each of the two keywords into one or more partial character strings by counting the evaluation results by the partial character string similarity evaluation unit 110. To do. The keyword similarity evaluation unit 109 evaluates the degree of similarity between two keywords by using the evaluation for a plurality of combinations of the division patterns of the two keywords.

そして、副作用判定・学習部１０８は、キーワード類似度評価部１０９による評価の結果を用いて、安全性情報報告文書２０５から抽出されたキーワードが示す副作用が、判定対象薬の類薬の副作用とどの程度類似しているかを判定する。 Then, the side effect determination / learning unit 108 uses the result of the evaluation by the keyword similarity evaluation unit 109 to determine which side effect indicated by the keyword extracted from the safety information report document 205 is the side effect of the determination target drug analog Determine if they are similar to some extent.

類似の度合が明らかに高ければ、副作用判定・学習部１０８は、安全性情報報告文書２０５から抽出されたキーワードが示す副作用を「類薬において既に知られていた既知の副作用」と判定し、判定結果を出力する。また、類似の度合が明らかに低ければ、副作用判定・学習部１０８は、安全性情報報告文書２０５から抽出されたキーワードが示す副作用を「類薬においても知れられていない、未知の副作用である」と判定し、判定結果を出力する。 If the degree of similarity is clearly high, the side effect determination / learning unit 108 determines that the side effect indicated by the keyword extracted from the safety information report document 205 is “a known side effect already known in a similar drug”, and the determination Output the result. If the degree of similarity is clearly low, the side effect determination / learning unit 108 determines that the side effect indicated by the keyword extracted from the safety information report document 205 is “an unknown side effect that is not known even in similar drugs”. And the determination result is output.

そして、類似の度合が中程度であれば、副作用判定・学習部１０８は、安全性情報報告文書２０５から抽出されたキーワードが示す副作用を「既知かもしれない副作用」と判定し、判定結果を出力する。この場合、副作用判定・学習部１０８は、「既知かもしれない」と判定された副作用が既知か未知か入力するようユーザに促し、ユーザからの入力を受け取る。 If the degree of similarity is medium, the side effect determination / learning unit 108 determines that the side effect indicated by the keyword extracted from the safety information report document 205 is “a side effect that may be known”, and outputs the determination result. To do. In this case, the side effect determination / learning unit 108 prompts the user to input whether the side effect determined to be “may be known” is known or unknown, and receives an input from the user.

そして、副作用判定・学習部１０８は、自ら「既知」と判断したキーワードと、ユーザから「既知」という判断が入力されたキーワードに関して、既知の副作用として学習する。すなわち、副作用判定・学習部１０８は、学習結果テーブル２０４内の判定対象薬に対応するエントリにおいて、既知副作用リストを更新する。 Then, the side effect determination / learning unit 108 learns as a known side effect for the keyword that is determined to be “known” by itself and the keyword for which the determination of “known” is input by the user. That is, the side effect determination / learning unit 108 updates the known side effect list in the entry corresponding to the determination target drug in the learning result table 204.

さらに、副作用判定・学習部１０８は、判定結果に基づいて、安全性情報報告文書２０５から抽出されたキーワードと類薬の副作用キーワード群に含まれるキーワードのペアで、同義語同士と見なせるものを、同義語辞書２０３に追加する。 Further, the side effect determination / learning unit 108, based on the determination result, is a pair of keywords extracted from the safety information report document 205 and keywords included in the side effect keyword group of similar drugs, which can be regarded as synonyms, It is added to the synonym dictionary 203.

なお、副作用判定・学習部１０８が参照する学習結果テーブル２０４内の副作用キーワード群は、副作用キーワード抽出部１０７が前処理制御部１１３からの命令にしたがって行う前処理によって予め得られたものである。すなわち、副作用キーワード抽出部１０７は、上記に説明した動作のほかに、前処理として、添付文書群２０１内の各添付文書２０２について以下の前処理を行う。 The side effect keyword group in the learning result table 204 referred to by the side effect determination / learning unit 108 is obtained in advance by preprocessing performed by the side effect keyword extracting unit 107 in accordance with a command from the preprocessing control unit 113. That is, the side effect keyword extraction unit 107 performs the following preprocessing for each attached document 202 in the attached document group 201 as preprocessing in addition to the operation described above.

副作用キーワード抽出部１０７は、当該添付文書２０２の「副作用」セクションから、副作用を示すキーワードを抽出する。そして、副作用キーワード抽出部１０７は、当該添付文書２０２に対応するエントリを学習結果テーブル２０４内で検索し、検索されたエントリの副作用キーワード群に、当該添付文書２０２から抽出したキーワードを追加登録する。 The side effect keyword extraction unit 107 extracts a keyword indicating a side effect from the “side effect” section of the attached document 202. Then, the side effect keyword extraction unit 107 searches the learning result table 204 for an entry corresponding to the attached document 202, and additionally registers the keyword extracted from the attached document 202 in the side effect keyword group of the searched entry.

なお、本実施形態では、各添付文書２０２は、当該添付文書２０２が添付される医薬品のＩＤをファイル名に含むことによって、当該ＩＤと対応づけられている。よって、副作用キーワード抽出部１０７は、どの添付文書２０２から抽出したキーワードを学習結果テーブル２０４のどのエントリに登録すればよいかを認識することができる。 In the present embodiment, each attached document 202 is associated with the ID by including the ID of the medicine to which the attached document 202 is attached in the file name. Therefore, the side effect keyword extraction unit 107 can recognize to which entry of the learning result table 204 the keyword extracted from which attached document 202 should be registered.

しかし、添付文書２０２とＩＤを対応づける方法は実施形態に応じて任意である。例えば、学習結果テーブル２０４に添付文書２０２のファイル名を示すフィールドがあってもよく、当該フィールドによってＩＤと添付文書２０２が対応づけられていてもよい。つまり、副作用キーワード抽出部１０７は添付文書２０２とＩＤの対応づけを、当該フィールドを参照することで認識してもよい。また、医薬品医療機器情報提供ホームページでは、Standard Generalized Markup Language（ＳＧＭＬ）形式とPortable Document Format（ＰＤＦ）形式で添付文書が公開されているが、本実施形態では添付文書２０２のファイル形式は任意である。 However, the method for associating the attached document 202 with the ID is arbitrary depending on the embodiment. For example, the learning result table 204 may include a field indicating the file name of the attached document 202, and the ID and the attached document 202 may be associated with each other by the field. That is, the side effect keyword extraction unit 107 may recognize the association between the attached document 202 and the ID by referring to the field. In addition, on the homepage for providing information on medical devices for pharmaceuticals, attached documents are disclosed in Standard Generalized Markup Language (SGML) format and Portable Document Format (PDF) format, but in this embodiment, the file format of the attached document 202 is arbitrary. .

以上のとおり、副作用処理部１０２は、安全性情報報告文書２０５で報告された副作用が既知か未知かを自動的又は半自動的に判定することができる。したがって、製薬会社は、厚生労働大臣への迅速な報告を行うことができる。 As described above, the side effect processing unit 102 can automatically or semi-automatically determine whether the side effect reported in the safety information report document 205 is known or unknown. Therefore, pharmaceutical companies can make prompt reports to the Minister of Health, Labor and Welfare.

また、詳しくは後述するが、本実施形態では、長い文字列同士の一致を短い文字列同士の一致よりも高く評価しつつも、長い文字列同士が必ずしも完全に一致していなくても部分的に一致していればある程度の評価を与える方針が採用されている。具体的には、キーワード類似度評価部１０９、副作用判定・学習部１０８及び類薬判定・学習部１１２が、上記方針にしたがって動作する。 Further, as will be described in detail later, in the present embodiment, while matching long character strings is evaluated to be higher than matching short character strings, even if long character strings do not necessarily match completely, partial matching may occur. A policy that gives a certain degree of evaluation if it agrees with is adopted. Specifically, the keyword similarity evaluation unit 109, the side effect determination / learning unit 108, and the similar drug determination / learning unit 112 operate according to the above policy.

上記方針によれば、副作用判定・学習部１０８による判定の精度を上げることができる。また、上記方針によれば、以下に述べる学習結果テーブル２０４の類薬リストの学習を、精度よく、かつ、なるべく漏れのないように行うことも可能となる。 According to the above policy, the accuracy of determination by the side effect determination / learning unit 108 can be increased. Further, according to the above policy, it is possible to perform learning of an analog medicine list in the learning result table 204 described below with high accuracy and as little leakage as possible.

なぜなら、第１に、長い文字列同士が必ずしも完全に一致していなくても部分的に一致していればある程度の評価を与えるようにすることで、表記の揺れや用語の不統一を吸収することができ、類似概念を表すキーワード同士の類似性も評価可能となるからである。 Because, firstly, even if long character strings do not necessarily match completely, if they are partially matched, a certain degree of evaluation is given to absorb fluctuations in notation and inconsistency of terms. This is because the similarity between keywords representing similar concepts can be evaluated.

例えば、「全身麻酔剤」と「全身吸入麻酔剤」という２つのキーワードは、完全には一致しないが、「全身」と「麻酔剤」という部分文字列において一致し、意味的にも強い関連性を持っている。よって、キーワード間での文字列の部分的な一致に対して、評価をゼロとするのではなく、ある程度の評価を与えることで、キーワード同士の意味的な一致又は類似をうまく評価して、評価の精度を上げることができる。また、たとえ類似するキーワード間に表記の差があったとしても、文字列の部分的な一致に対してある程度の評価を与えることで、類似性を見落とすリスクが減るので、漏れのない学習が可能となる。 For example, the two keywords "general anesthetic" and "general inhalation anesthetic" do not match completely, but they match in the substrings "systemic" and "anesthetic" and have strong semantic relevance. have. Therefore, instead of setting the evaluation to zero for partial matching of character strings between keywords, giving a certain degree of evaluation, evaluates the semantic match or similarity between keywords well, and evaluates Can improve the accuracy. Also, even if there is a difference in notation between similar keywords, giving a certain degree of evaluation to partial matching of character strings reduces the risk of overlooking similarities, so learning without omission is possible It becomes.

また、第２の理由は、一般的な傾向として長いキーワードは意味的に限定された内容を表すことが多く、キーワード抽出におけるノイズの多くは短いキーワードだからである。そのため、長い文字列同士の一致を重視することで、ノイズの影響を低減することができる。 The second reason is that, as a general tendency, long keywords often represent semantically limited contents, and most of noise in keyword extraction is short keywords. Therefore, it is possible to reduce the influence of noise by placing importance on matching long character strings.

例えば、添付文書２０２の「副作用」セクションにおける「このような症状があらわれた場合は投与を中止してください。」などの自然文から「症状」という短いキーワードが抽出されるかもしれない。しかし、こうして抽出された「症状」というキーワードは具体的な副作用を示すものではない。他方で、「アナフィラキシー様症状」のような長いキーワードは、具体的な副作用を示すことが多い。よって、長い文字列同士の一致を重視することでノイズの影響を低減することができる。 For example, a short keyword “symptom” may be extracted from a natural sentence such as “please discontinue administration if such a symptom appears” in the “side effect” section of the package insert 202. However, the keyword “symptom” thus extracted does not indicate a specific side effect. On the other hand, long keywords such as “anaphylactoid symptoms” often show specific side effects. Therefore, the influence of noise can be reduced by placing importance on matching long character strings.

そして、第３の理由は、長い文字列同士の一致を重視することで、部分的な一致に起因する過度の評価の悪影響を抑えることができるからである。場合によっては、文字列の部分的な一致に対して評価を与えることで、キーワード間の類似度を過度に高く評価してしまうおそれがあるが、本実施形態では過度の評価に起因する悪影響を抑えることができる。 And the 3rd reason is because the bad influence of the excessive evaluation resulting from partial matching can be suppressed by attaching importance to matching of long character strings. In some cases, by giving an evaluation to partial matching of character strings, there is a possibility that the similarity between keywords may be evaluated excessively high, but in this embodiment, there is an adverse effect caused by excessive evaluation. Can be suppressed.

つまり、本実施形態では、長い文字列同士の一致を重視する方針のもと、いくつかの短い部分文字列同士が一致するキーワード間の類似度は、それらの部分文字列の合計の長さに相当する長い文字列が一致するキーワード間の類似度以下に評価される。例えば、ある２つのキーワードにおいて、偶然、離れた場所にある３文字が共通していたとしても、当該２つのキーワードの類似度は、連続する３文字を共有する別の２つのキーワード間の類似度以下にしか評価されない。よって、長い文字列同士の一致を重視することで、複数の短い部分文字列の偶然の一致によるノイズを低減させることができる。 In other words, in this embodiment, based on a policy that places importance on matching long character strings, the similarity between keywords that match several short partial character strings is the total length of those partial character strings. It is evaluated below the degree of similarity between keywords matching the corresponding long character string. For example, even if three characters in a certain place are accidentally shared by two keywords, the similarity between the two keywords is the similarity between two keywords that share three consecutive characters. Only evaluated below. Therefore, by placing importance on matching long character strings, noise due to accidental matching of a plurality of short partial character strings can be reduced.

以上のような理由から、本実施形態によれば、副作用判定・学習部１０８が高精度の判定を行うことができ、類薬判定・学習部１１２が類薬リストの学習を精度よく、かつ、なるべく漏れのないように行うこともできる。そして、精度よく、かつ、なるべく漏れのないように学習された類薬リストを利用することで、結局は、副作用判定・学習部１０８が判定精度を向上させることができる。 For the reasons as described above, according to the present embodiment, the side effect determination / learning unit 108 can perform high-precision determination, the analogy drug determination / learning unit 112 can accurately learn the analogy drug list, and It can also be carried out so as not to leak as much as possible. The side effect determination / learning unit 108 can eventually improve the determination accuracy by using the analogy medicine list learned with high accuracy and as little leakage as possible.

なぜなら、類薬リストの精度が悪いと、本当は判定対象薬の類薬ではない医薬品でのみ知られていた副作用を、副作用判定・学習部１０８が誤って「既知の副作用」と判定してしまうおそれがあるからである。また、類薬リストに漏れがあると、本当は判定対象薬の類薬において既知の副作用を、副作用判定・学習部１０８が誤って「未知の副作用」と判定してしまうおそれがあるからである。よって、精度よく、かつ、なるべく漏れのないように学習された類薬リストを使うことで、副作用判定・学習部１０８が判定精度を向上させることができる。 Because, if the accuracy of the similar drug list is poor, there is a risk that the side effect determination / learning unit 108 may erroneously determine “known side effects” for side effects that are known only for drugs that are not the target drug analogs. Because there is. In addition, if there is a leak in the similar drug list, there is a possibility that the side effect determination / learning unit 108 may erroneously determine “unknown side effect” for a known side effect in the similar drug of the determination target drug. Therefore, the side effect determination / learning unit 108 can improve the determination accuracy by using a similar medicine list learned with high accuracy and as little leakage as possible.

したがって、本実施形態の判定装置１００を利用することで、製薬会社は、副作用についての正確な報告を、従来よりも迅速に厚生労働大臣に対して行うことができるようになる。 Therefore, by using the determination apparatus 100 of the present embodiment, the pharmaceutical company can make an accurate report on side effects to the Minister of Health, Labor and Welfare more quickly than before.

さて、上記のような学習結果テーブル２０４における類薬リストの学習は、本実施形態では前処理制御部１１３の命令にしたがって類薬処理部１０３により事前に行われる。すなわち、本実施形態では、類薬判定・学習部１１２が、添付文書群２０１の中の任意の２つの添付文書２０２の組について、当該２つの添付文書２０２に対応する２つの医薬品同士が類薬同士であるか否かを判定し、判定した結果を学習する。 Now, learning of the analog medicine list in the learning result table 204 as described above is performed in advance by the analog medicine processing unit 103 in accordance with an instruction from the preprocessing control unit 113 in this embodiment. In other words, in the present embodiment, the similar medicine determination / learning unit 112 determines that two medicines corresponding to the two attached documents 202 are similar to each other with respect to a set of arbitrary two attached documents 202 in the attached document group 201. It is determined whether or not each other, and the determined result is learned.

具体的には、効能・効果キーワード抽出部１１１が、まず、各添付文書２０２について、「効能又は効果」セクションからキーワードを抽出する。そして効能・効果キーワード抽出部１１１は、当該添付文書２０２に対応するエントリを学習結果テーブル２０４内で検索し、検索されたエントリの効能・効果キーワード群に、抽出したキーワードを追加登録する。なお、効能・効果キーワード抽出部１１１は、添付文書２０２とＩＤの対応づけを、例えば添付文書２０２のファイル名により認識することができる。 Specifically, the efficacy / effect keyword extraction unit 111 first extracts a keyword from the “efficacy or effect” section for each attached document 202. The efficacy / effect keyword extraction unit 111 searches the learning result table 204 for an entry corresponding to the attached document 202, and additionally registers the extracted keyword in the efficacy / effect keyword group of the searched entry. The effect / effect keyword extraction unit 111 can recognize the correspondence between the attached document 202 and the ID, for example, by the file name of the attached document 202.

また、類薬判定・学習部１１２は、類薬同士であるか否かを判定しようとする２つの添付文書２０２が「薬効分類名」又は「有効成分に関する理化学的知見」のセクションで一致していれば、２つの医薬品同士が類薬であると見なす。 In addition, in the similar drug determination / learning unit 112, the two package inserts 202 that are to be determined whether or not they are similar to each other are identical in the “medicinal effect classification name” or “physical and chemical knowledge regarding active ingredients” section. The two medicines are regarded as similar drugs.

他方、「薬効分類名」又は「有効成分に関する理化学的知見」のセクションでの一致が検出されない場合、類薬判定・学習部１１２は、２つの添付文書２０２の「効能又は効果」セクション同士の類似度を求める。つまり、類薬判定・学習部１１２は、２つの添付文書２０２にそれぞれ対応する学習結果テーブル２０４内のエントリにそれぞれ学習済みの効能・効果キーワード群同士の類似度を求める。 On the other hand, if no match is detected in the “medicinal effect classification name” or “physical and chemical knowledge regarding active ingredient” section, the analogy drug determination / learning unit 112 determines the similarity between the “efficacy or effect” sections of the two package inserts 202. Find the degree. That is, the similar medicine determination / learning unit 112 obtains the similarity between the learned effect / effect keyword groups in the entries in the learning result table 204 corresponding to the two attached documents 202, respectively.

具体的には、類薬判定・学習部１１２は、一方の添付文書２０２に対応するエントリの効能・効果キーワード群内のキーワードと、他方の２０２に対応するエントリの効能・効果キーワード群内のキーワードとの類似度をキーワード類似度評価部１０９に評価させる。そして、類薬判定・学習部１１２は、キーワード類似度評価部１０９による評価の結果を用いて、２つの添付文書２０２の「効能又は効果」セクション同士がどの程度類似しているかを判定する。 Specifically, the analogy drug determination / learning unit 112 determines the keyword in the efficacy / effect keyword group of the entry corresponding to one attached document 202 and the keyword in the efficacy / effect keyword group of the entry corresponding to the other 202. Is evaluated by the keyword similarity evaluation unit 109. Then, the similar medicine determination / learning unit 112 determines how similar the “efficacy or effect” sections of the two attached documents 202 are based on the evaluation result by the keyword similarity evaluation unit 109.

類似の度合が明らかに高ければ、類薬判定・学習部１１２は、当該２つの医薬品同士を「類薬同士」と判定し、判定結果を出力する。また、類似の度合が明らかに低ければ、類薬判定・学習部１１２は、当該２つの医薬品同士を「類薬同士ではない」と判定し、判定結果を出力する。 If the degree of similarity is clearly high, the similar medicine determination / learning unit 112 determines that the two medicines are “similar medicines” and outputs a determination result. If the degree of similarity is clearly low, the similar drug determination / learning unit 112 determines that the two medicines are not similar drugs, and outputs a determination result.

そして、類似の度合が中程度であれば、類薬判定・学習部１１２は、当該２つの医薬品について「類薬同士の可能性がある組み合わせ」と判定し、判定結果を出力する。この場合、類薬判定・学習部１１２は、「類薬同士の可能性がある」と判定された２つの医薬品が類薬同士か否かを入力するようユーザに促し、ユーザからの入力を受け取る。 Then, if the degree of similarity is moderate, the similar drug determination / learning unit 112 determines that the two drugs are “a possible combination of similar drugs”, and outputs a determination result. In this case, the similar medicine determination / learning unit 112 urges the user to input whether or not the two medicines determined as “possibility of similar medicines” are similar medicines, and receives input from the user .

そして、類薬判定・学習部１１２は、自ら「類薬同士」と判断した医薬品の組み合わせと、ユーザから「類薬同士」という判断が入力された医薬品同士の組み合わせに関して、類薬同士の関係を学習する。すなわち、類薬判定・学習部１１２は、類薬同士である２つの医薬品それぞれに対応する学習結果テーブル２０４内のエントリにおいて、類薬リストを更新する。 Then, the similar drug determination / learning unit 112 determines the relationship between similar drugs with respect to the combination of drugs that have been determined as “analogous drugs” and the combination of drugs that have been input by the user as “drugs”. learn. That is, the similar drug determination / learning unit 112 updates the similar drug list in the entry in the learning result table 204 corresponding to each of two drugs that are similar drugs.

さらに、類薬判定・学習部１１２は、２つの添付文書２０２の「効能又は効果」セクション同士の類似度にしたがって２つの医薬品同士を類薬と判定した場合、効能又は効果を示すキーワードのペアで同義語同士と見なせるものを、同義語辞書２０３に追加する。 Furthermore, when the two similar drugs are determined to be similar according to the similarity between the “efficacy or effect” sections of the two package inserts 202, the similar drug determination / learning unit 112 uses a pair of keywords indicating the efficacy or effect. What can be regarded as synonyms are added to the synonym dictionary 203.

ところで、図１の判定装置１００は、専用のハードウェア回路、プログラムを実行する汎用の情報処理装置、あるいはその組み合わせにより実現可能だが、本実施形態では、プログラムを実行する情報処理装置により判定装置１００が実現される。そこで、続いて、判定装置１００の各部が具体的にはどのようなハードウェアを用いて実現されるのかを説明する。 Incidentally, the determination apparatus 100 of FIG. 1 can be realized by a dedicated hardware circuit, a general-purpose information processing apparatus that executes a program, or a combination thereof, but in the present embodiment, the determination apparatus 100 includes an information processing apparatus that executes a program. Is realized. Therefore, subsequently, what kind of hardware is used to implement each unit of the determination apparatus 100 will be described.

図２は、コンピュータの構成図である。図２のコンピュータ３００は、プログラムを実行する汎用の情報処理装置の一例である。
コンピュータ３００は、Central Processing Unit（ＣＰＵ）３０１、Read Only Memory（ＲＯＭ）３０２、Random Access Memory（ＲＡＭ）３０３及び通信インタフェース３０４を有する。また、コンピュータ３００は、入力装置３０５、出力装置３０６、記憶装置３０７及び可搬型記憶媒体３１０の駆動装置３０８を有する。そして、ＣＰＵ３０１、ＲＯＭ３０２、ＲＡＭ３０３、通信インタフェース３０４、入力装置３０５、出力装置３０６、記憶装置３０７及び駆動装置３０８は、バス３０９により互いに接続されている。 FIG. 2 is a configuration diagram of the computer. A computer 300 in FIG. 2 is an example of a general-purpose information processing apparatus that executes a program.
The computer 300 includes a central processing unit (CPU) 301, a read only memory (ROM) 302, a random access memory (RAM) 303, and a communication interface 304. The computer 300 also includes an input device 305, an output device 306, a storage device 307, and a drive device 308 for a portable storage medium 310. The CPU 301, ROM 302, RAM 303, communication interface 304, input device 305, output device 306, storage device 307, and drive device 308 are connected to each other via a bus 309.

入力装置３０５は、例えば、キーボードでもよいし、マウスなどのポインティングデバイスデバイスでもよいし、その組み合わせでもよい。出力装置３０６は、例えば、液晶ディスプレイなどのディスプレイ、スピーカ、プリンタ、又はそれらの任意の組み合わせである。 The input device 305 may be, for example, a keyboard, a pointing device device such as a mouse, or a combination thereof. The output device 306 is, for example, a display such as a liquid crystal display, a speaker, a printer, or any combination thereof.

記憶装置３０７は、ハードディスク装置、フラッシュメモリなどの不揮発性の半導体メモリ装置、又はその組み合わせである。また、可搬型記憶媒体３１０としては、Compact Disc（ＣＤ）やDigital Versatile Disk（ＤＶＤ）などの光ディスク、光磁気ディスク、磁気ディスク、フラッシュメモリなどの不揮発性の半導体メモリカードなどが利用可能である。 The storage device 307 is a hard disk device, a nonvolatile semiconductor memory device such as a flash memory, or a combination thereof. As the portable storage medium 310, an optical disk such as a Compact Disc (CD) or a Digital Versatile Disk (DVD), a non-volatile semiconductor memory card such as a magneto-optical disk, a magnetic disk, or a flash memory can be used.

コンピュータ３００は、図２に示すように、通信インタフェース３０４とネットワーク３１１を介して、他のコンピュータ３１２と接続されていてもよい。ネットワーク３１１は、Local Area Network（ＬＡＮ）やインターネットなどの任意のネットワークでよい。 The computer 300 may be connected to another computer 312 via a communication interface 304 and a network 311 as shown in FIG. The network 311 may be an arbitrary network such as a local area network (LAN) or the Internet.

ＣＰＵ３０１は、ＲＡＭ３０３にプログラムをロードしてＲＡＭ３０３をワークエリアとして用いながらプログラムを実行することにより、図１の副作用処理部１０２、類薬処理部１０３及び前処理制御部１１３の機能を実現することができる。そして、ＣＰＵ３０１が実行する上記プログラムは、予めＲＯＭ３０２又は記憶装置３０７にインストールされていてもよいし、可搬型記憶媒体３１０に格納されて提供され、駆動装置３０８により読み取られて記憶装置３０７にコピーされてもよい。あるいは、上記プログラムは、他のコンピュータ３１２からネットワーク３１１を介して記憶装置３０７にダウンロードされてもよい。 The CPU 301 loads the program into the RAM 303 and executes the program while using the RAM 303 as a work area, thereby realizing the functions of the side effect processing unit 102, the similar medicine processing unit 103, and the preprocessing control unit 113 in FIG. it can. The program executed by the CPU 301 may be installed in the ROM 302 or the storage device 307 in advance, provided by being stored in the portable storage medium 310, read by the drive device 308, and copied to the storage device 307. May be. Alternatively, the program may be downloaded from the other computer 312 to the storage device 307 via the network 311.

より具体的に本実施形態における図１と図２の対応を説明すると、次のとおりである。
図１の格納部１０１は、図２の記憶装置３０７により実現される。また、図１の判定対象薬指定部１０４は、図２の入力装置３０５とＣＰＵ３０１により実現される。つまり、判定対象薬ＩＤ２０６は入力装置３０５から入力され、ＣＰＵ３０１により認識される。そして、図１の類薬認識部１０５は、ＣＰＵ３０１により実現される。 More specifically, the correspondence between FIG. 1 and FIG. 2 in the present embodiment will be described as follows.
The storage unit 101 in FIG. 1 is realized by the storage device 307 in FIG. 1 is realized by the input device 305 and the CPU 301 in FIG. That is, the determination target drug ID 206 is input from the input device 305 and recognized by the CPU 301. 1 is realized by the CPU 301.

また、図１の安全性情報報告文書２０５は、他のコンピュータ３１２から与えられてもよく、その場合、図１の報告文書取得部１０６は、通信インタフェース３０４とＣＰＵ３０１により実現されてもよい。あるいは、安全性情報報告文書２０５は、可搬型記憶媒体３１０から読み込まれてもよく、その場合、報告文書取得部１０６は、可搬型記憶媒体３１０の駆動装置３０８とＣＰＵ３０１により実現されてもよい。あるいは、安全性情報報告文書２０５は、入力装置３０５から入力されてもよく、その場合、報告文書取得部１０６は、入力装置３０５とＣＰＵ３０１により実現されてもよい。 Further, the safety information report document 205 of FIG. 1 may be given from another computer 312, and in that case, the report document acquisition unit 106 of FIG. 1 may be realized by the communication interface 304 and the CPU 301. Alternatively, the safety information report document 205 may be read from the portable storage medium 310, and in that case, the report document acquisition unit 106 may be realized by the drive device 308 and the CPU 301 of the portable storage medium 310. Alternatively, the safety information report document 205 may be input from the input device 305, and in that case, the report document acquisition unit 106 may be realized by the input device 305 and the CPU 301.

また、どこから与えられるにしろ、安全性情報報告文書２０５は、記憶装置３０７又はＲＡＭ３０３に格納される。つまり、報告文書取得部１０６が安全性情報報告文書２０５を出力する先である副作用キーワード抽出部１０７は、さらに記憶装置３０７又はＲＡＭ３０３を含む。 Moreover, the safety information report document 205 is stored in the storage device 307 or the RAM 303 regardless of where it is given. That is, the side effect keyword extraction unit 107 to which the report document acquisition unit 106 outputs the safety information report document 205 further includes the storage device 307 or the RAM 303.

そして、図１の副作用キーワード抽出部１０７はＣＰＵ３０１により実現される。また、図１の副作用判定・学習部１０８は、プログラムにしたがって処理を行うＣＰＵ３０１、並びに、ユーザインタフェースを実現するための出力装置３０６及び入力装置３０５によって実現される。 The side effect keyword extraction unit 107 in FIG. Further, the side effect determination / learning unit 108 of FIG. 1 is realized by a CPU 301 that performs processing according to a program, and an output device 306 and an input device 305 for realizing a user interface.

そして、図１の部分文字列類似度評価部１１０を含むキーワード類似度評価部１０９と効能・効果キーワード抽出部１１１も、ＣＰＵ３０１によって実現することができる。また、類薬判定・学習部１１２は、副作用判定・学習部１０８と同様に、ＣＰＵ３０１と出力装置３０６と入力装置３０５によって実現される。そして、前処理制御部１１３も、ＣＰＵ３０１と、前処理の開始を指示する入力を受け取る入力装置３０５によって実現することができる。 The keyword similarity evaluation unit 109 and the efficacy / effect keyword extraction unit 111 including the partial character string similarity evaluation unit 110 of FIG. 1 can also be realized by the CPU 301. Similarly to the side effect determination / learning unit 108, the analogy drug determination / learning unit 112 is realized by the CPU 301, the output device 306, and the input device 305. The preprocessing control unit 113 can also be realized by the CPU 301 and the input device 305 that receives an input for instructing the start of the preprocessing.

以上、例えば図２のコンピュータ３００により実現される図１の判定装置１００について、構成と動作の概略を説明し、また、判定装置１００が利用する情報の概略についても説明した。そこで、以下では副作用処理部１０２の動作について、図３〜１０を参照してさらに詳しく説明する。 The outline of the configuration and operation of the determination apparatus 100 of FIG. 1 realized by the computer 300 of FIG. 2 has been described above, and the outline of information used by the determination apparatus 100 has also been described. Therefore, hereinafter, the operation of the side effect processing unit 102 will be described in more detail with reference to FIGS.

図３は、副作用判定・学習処理のフローチャートである。
ステップＳ１０１で判定対象薬指定部１０４は、判定対象薬ＩＤ２０６の入力を受け取ることで、判定対象薬を特定し、判定対象薬ＩＤ２０６を類薬認識部１０５と副作用判定・学習部１０８に出力する。 FIG. 3 is a flowchart of the side effect determination / learning process.
In step S 101, the determination target drug specifying unit 104 receives the input of the determination target drug ID 206, identifies the determination target drug, and outputs the determination target drug ID 206 to the similar drug recognition unit 105 and the side effect determination / learning unit 108.

すると、次のステップＳ１０２で類薬認識部１０５は、判定対象薬指定部１０４から入力された判定対象薬ＩＤ２０６を検索キーにして学習結果テーブル２０４を検索し、ＩＤが判定対象薬ＩＤ２０６と一致するエントリから類薬リストを取得する。 Then, in the next step S102, the similar drug recognition unit 105 searches the learning result table 204 using the determination target drug ID 206 input from the determination target drug specifying unit 104 as a search key, and the ID matches the determination target drug ID 206. Get a list of similar drugs from an entry.

ここで、学習結果テーブル２０４の例について、図４の例を参照してより詳しく説明する。
学習結果テーブル２０４は、前述のとおり、「ＩＤ」、「効能・効果キーワード群」、「類薬リスト」、「副作用キーワード群」及び「既知副作用リスト」というフィールドを有する。詳しくは後述するが、既知副作用リストは図３のステップＳ１１６又はＳ１１７で設定され、それ以外のフィールドは、図１１〜１４とともに後述する処理によって予め設定される。 Here, an example of the learning result table 204 will be described in more detail with reference to the example of FIG.
As described above, the learning result table 204 includes fields of “ID”, “efficacy / effect keyword group”, “similar medicine list”, “side effect keyword group”, and “known side effect list”. As will be described in detail later, the known side effect list is set in step S116 or S117 of FIG. 3, and the other fields are set in advance by the processing described later with reference to FIGS.

図４には、次の（ａ１）〜（ａ５）に説明するエントリを含む学習結果テーブル２０４が例示されている。なお、図４では、紙幅の都合上省略したいくつかのキーワードを「……」と示してある。 FIG. 4 illustrates a learning result table 204 including entries described in the following (a1) to (a5). In FIG. 4, some keywords omitted for convenience of paper width are indicated as “...

（ａ１）「１１１２２２Ａ３３３３」というＩＤを持つエントリ
このエントリにおける効能・効果キーワード群は、「気管支炎」というキーワードを含む。また、このエントリにおける類薬リストは、図４には不図示のエントリを示す「３９６３９６Ｂ７７７７」というＩＤと、下記（ａ５）のエントリを示す「９９８８７７Ｆ５０５０」というＩＤを含む。そして、このエントリにおいて、副作用キーワード群は「頭痛」と「肺出血」というキーワードを含み、既知副作用リストは「頭痛」というキーワードを含む。 (A1) Entry with ID “111222A3333” The efficacy / effect keyword group in this entry includes the keyword “bronchitis”. In addition, the similar medicine list in this entry includes an ID “396396B7777” indicating an entry not shown in FIG. 4 and an ID “998877F5050” indicating the entry (a5) below. In this entry, the side effect keyword group includes the keywords “headache” and “pulmonary hemorrhage”, and the known side effect list includes the keyword “headache”.

例えば、「１１１２２２Ａ３３３３」というＩＤの医薬品に関して新たに臨床現場から頭痛という副作用が報告されたときに、「３９６３９６Ｂ７７７７」というＩＤが示す類薬において頭痛が副作用として既知であったとする。そして、その後、「１１１２２２Ａ３３３３」というＩＤの医薬品の添付文書２０２が改訂され、「副作用」セクションに頭痛が追記されたとする。例えば以上のような場合に、この（ａ１）のエントリは、図４に示した状態となる。 For example, when a side effect of a headache is newly reported from a clinical site regarding a drug with an ID of “111222A3333”, the headache is known as a side effect in the similar drug indicated by the ID of “396396B7777”. After that, it is assumed that the package insert 202 of the medicine with the ID “111222A3333” is revised and a headache is added to the “side effect” section. For example, in the above case, the entry (a1) is in the state shown in FIG.

（ａ２）「２３４５６７Ｆ０９０９」というＩＤを持つエントリ
このエントリにおける効能・効果キーワード群は、「蕁麻疹」と「湿疹」というキーワードを含む。また、このエントリにおける類薬リストは、図４には不図示のエントリを示す「５６７５６７Ａ１２１２」というＩＤを含む。そして、このエントリにおいて、副作用キーワード群は「糖尿病」と「貧血」というキーワードを含み、既知副作用リストは空である。 (A2) Entry with ID “234567F0909” The efficacy / effect keyword group in this entry includes the keywords “urticaria” and “eczema”. Further, the similar medicine list in this entry includes an ID “567567A1212” indicating an entry not shown in FIG. In this entry, the side effect keyword group includes the keywords “diabetes” and “anemia”, and the known side effect list is empty.

（ａ３）「４４４５５５Ａ７７７７」というＩＤを持つエントリ
このエントリにおける効能・効果キーワード群は、「関節炎」と「急性上気道炎」というキーワードを含む。また、このエントリにおける類薬リストは、下記（ａ４）のエントリを示す「７７７８８８Ｃ９０９０」というＩＤと、下記（ａ５）のエントリを示す「９９８８７７Ｆ５０５０」というＩＤを含む。そして、このエントリにおいて、副作用キーワード群は「貧血」と「肺出血」というキーワードを含み、既知副作用リストは空である。 (A3) Entry with ID “444555A7777” The efficacy / effect keyword group in this entry includes the keywords “arthritis” and “acute upper respiratory tract inflammation”. The similar medicine list in this entry includes an ID “777788C9090” indicating the entry (a4) below and an ID “998877F5050” indicating the entry (a5) below. In this entry, the side effect keyword group includes the keywords “anemia” and “pulmonary hemorrhage”, and the known side effect list is empty.

（ａ４）「７７７８８８Ｃ９０９０」というＩＤを持つエントリ
このエントリにおける効能・効果キーワード群は、「急性上気道炎」と「解熱」というキーワードを含む。また、このエントリにおける類薬リストは、上記（ａ３）のエントリを示す「４４４５５５Ａ７７７７」というＩＤと、下記（ａ５）のエントリを示す「９９８８７７Ｆ５０５０」というＩＤを含む。そして、このエントリにおいて、副作用キーワード群は「心不全」と「胃潰瘍」というキーワードを含み、既知副作用リストは空である。 (A4) Entry with ID “777788C9090” The efficacy / effect keyword group in this entry includes the keywords “acute upper respiratory tract inflammation” and “antipyretic”. The analogy drug list in this entry includes an ID “444555A7777” indicating the entry (a3) and an ID “998877F5050” indicating the entry (a5) below. In this entry, the side effect keyword group includes the keywords “heart failure” and “gastric ulcer”, and the known side effect list is empty.

（ａ５）「９９８８７７Ｆ５０５０」というＩＤを持つエントリ
このエントリにおける効能・効果キーワード群は、「急性上気道炎」と「解熱」と「気管支炎」というキーワードを含む。また、このエントリにおける類薬リストは、上記（ａ１）のエントリを示す「１１１２２２Ａ３３３３」というＩＤと、上記（ａ３）のエントリを示す「４４４５５５Ａ７７７７」というＩＤと、上記（ａ４）のエントリを示す「７７７８８８Ｃ９０９０」というＩＤを含む。そして、このエントリにおいて、副作用キーワード群は「血圧低下」と「貧血」というキーワードを含み、既知副作用リストは「心不全」というキーワードを含む。 (A5) Entry with ID “998877F5050” The efficacy / effect keyword group in this entry includes the keywords “acute upper respiratory tract inflammation”, “antipyretic fever”, and “bronchitis”. Further, the analogy medicine list in this entry includes an ID “111222A3333” indicating the entry (a1), an ID “444555A7777” indicating the entry (a3), and “777888C9090 indicating the entry (a4). "Is included. In this entry, the side effect keyword group includes the keywords “blood pressure reduction” and “anemia”, and the known side effect list includes the keyword “heart failure”.

例えば、この（ａ５）のエントリに対応する「９９８８７７Ｆ５０５０」というＩＤの医薬品に関して、副作用として心不全が臨床現場から報告されたことがあるとする。そして、その時点で、上記（ａ４）のエントリに相当する類薬において、心不全という副作用は添付文書２０２に記載されており、図４に示すように（ａ４）のエントリの副作用キーワード群に「心不全」というキーワードが含まれていたとする。その結果、心不全が「類薬において既知の副作用」として学習され、一方で（ａ５）のエントリに対応する医薬品の添付文書２０２の改訂はまだ済んでいないという状況において、（ａ５）のエントリは、図４に示した状態となる。 For example, it is assumed that heart failure has been reported from the clinical site as a side effect for a drug with an ID of “998877F5050” corresponding to the entry of (a5). At that time, in the similar drug corresponding to the entry (a4), the side effect of heart failure is described in the package insert 202, and as shown in FIG. 4, the side effect keyword group of the entry (a4) is “heart failure”. ”Is included. As a result, in the situation where heart failure is learned as “a known side effect in a similar drug”, while the package insert 202 of the drug corresponding to the entry in (a5) has not yet been revised, the entry in (a5) The state shown in FIG. 4 is obtained.

ここで、図３の説明に戻る。説明の便宜上、例えばステップＳ１０１で入力された判定対象薬ＩＤ２０６が、上記（ａ５）のエントリを示す「９９８８７７Ｆ５０５０」というＩＤだったとする。すると、ステップＳ１０２で類薬認識部１０５は、（１１１２２２Ａ３３３３，４４４５５５Ａ７７７７，７７７８８８Ｃ９０９０）という類薬リストを取得する。 Returning to the description of FIG. For convenience of explanation, for example, it is assumed that the determination target drug ID 206 input in step S101 is an ID “998877F5050” indicating the entry (a5). Then, in step S102, the similar drug recognition unit 105 acquires the similar drug list (111222A3333, 444555A7777, 777888C9090).

また、ステップＳ１０３では、報告文書取得部１０６が、判定対象薬に関する安全性情報報告文書２０５の入力を受け付け、安全性情報報告文書２０５を副作用キーワード抽出部１０７に出力する。 In step S 103, the report document acquisition unit 106 receives an input of the safety information report document 205 related to the determination target drug, and outputs the safety information report document 205 to the side effect keyword extraction unit 107.

なお、ステップＳ１０３は、ステップＳ１０１〜Ｓ１０２の前に行われてもよい。また、実施形態によっては、安全性情報報告文書２０５が、副作用について報告する対象の医薬品のＩＤを含んでもよい。その場合、判定対象薬指定部１０４は、ステップＳ１０１で明示的な判定対象薬ＩＤ２０６の入力を受け付ける代わりに、ステップＳ１０３で受け付けられた安全性情報報告文書２０５から判定対象薬ＩＤ２０６を抽出することで、判定対象薬を特定してもよい。 Note that step S103 may be performed before steps S101 to S102. In some embodiments, the safety information report document 205 may include the ID of a target drug for reporting side effects. In this case, the determination target drug specifying unit 104 extracts the determination target drug ID 206 from the safety information report document 205 received in step S103 instead of receiving an explicit determination target drug ID 206 input in step S101. The determination target drug may be specified.

ステップＳ１０３の実行後、ステップＳ１０４において、副作用キーワード抽出部１０７が、判定対象薬の副作用を示すキーワードを安全性情報報告文書２０５から抽出し、抽出したキーワードを副作用判定・学習部１０８に出力する。なお、副作用キーワード抽出部１０７がキーワードの抽出に用いるアルゴリズムは実施形態に応じて様々でよい。 After execution of step S103, in step S104, the side effect keyword extraction unit 107 extracts a keyword indicating the side effect of the determination target drug from the safety information report document 205, and outputs the extracted keyword to the side effect determination / learning unit 108. Note that the algorithm used by the side effect keyword extraction unit 107 for keyword extraction may vary depending on the embodiment.

例えば、副作用キーワード抽出部１０７は、安全性情報報告文書２０５を形態素解析し、名詞の１つ以上の連なりをキーワードとして抽出してもよい。副作用キーワード抽出部１０７は、形態素解析の結果に対してさらに構文解析を行い、構文解析の結果を使ってキーワードを抽出してもよい。 For example, the side effect keyword extracting unit 107 may perform morphological analysis on the safety information report document 205 and extract one or more series of nouns as keywords. The side effect keyword extraction unit 107 may further perform syntax analysis on the result of morphological analysis, and extract a keyword using the result of syntax analysis.

あるいは、安全性情報報告文書２０５の形式によっては、副作用キーワード抽出部１０７は、安全性情報報告文書２０５内の所定の項目を切り出すだけでキーワードを取得することができる場合もある。 Alternatively, depending on the format of the safety information report document 205, the side effect keyword extraction unit 107 may be able to acquire a keyword simply by cutting out a predetermined item in the safety information report document 205.

もちろん、副作用キーワード抽出部１０７は、例えば形態素解析などの処理を行う場合にも、安全性情報報告文書２０５の形式に応じて、安全性情報報告文書２０５のうち適宜の一部分のみを形態素解析することができる。例えば、安全性情報報告文書２０５が、副作用があらわれた患者に関する病歴等の情報の欄と、副作用を記述する欄と、報告者たる医師の氏名等を示す欄を含む場合、副作用キーワード抽出部１０７は副作用を記述する欄のデータのみを抜き出して形態素解析すればよい。 Of course, the side effect keyword extraction unit 107 performs morphological analysis on only a part of the safety information report document 205 according to the format of the safety information report document 205 even when processing such as morphological analysis is performed. Can do. For example, when the safety information report document 205 includes a column of information such as a medical history regarding a patient who has had side effects, a column describing side effects, and a column indicating the name of a doctor who is a reporter, the side effect keyword extracting unit 107 For morphological analysis, only the data in the column describing the side effects should be extracted.

あるいは、副作用キーワード抽出部１０７は、字種に基づく簡易的なキーワード抽出処理を行ってもよい。例えば、副作用キーワード抽出部１０７は、医学用語の特徴に鑑みて、漢字、カタカナ又は英字の連なりをキーワードとして抽出してもよい。 Alternatively, the side effect keyword extraction unit 107 may perform simple keyword extraction processing based on the character type. For example, the side effect keyword extraction unit 107 may extract a sequence of kanji, katakana, or English as a keyword in view of the characteristics of medical terms.

また、例えば各種副作用の辞書のデータが利用可能であれば、副作用キーワード抽出部１０７は、安全性情報報告文書２０５から、辞書のエントリと一致する文字列をキーワードとして抽出してもよい。 Further, for example, if dictionary data of various side effects is available, the side effect keyword extracting unit 107 may extract a character string that matches the dictionary entry from the safety information report document 205 as a keyword.

いずれにせよ、副作用キーワード抽出部１０７は、報告文書取得部１０６が取得した安全性情報報告文書２０５から、実施形態に応じた適宜のアルゴリズムにしたがって、副作用を示すキーワードを抽出する。そして、副作用キーワード抽出部１０７は、抽出したキーワードを副作用判定・学習部１０８に通知する。 In any case, the side effect keyword extraction unit 107 extracts keywords indicating side effects from the safety information report document 205 acquired by the report document acquisition unit 106 according to an appropriate algorithm according to the embodiment. Then, the side effect keyword extraction unit 107 notifies the extracted keyword to the side effect determination / learning unit 108.

なお、ステップＳ１０４で抽出されるキーワードは、１つのこともあるし、複数のこともある。以下では説明の便宜上、ステップＳ１０４において「心不全」と「頭痛」と「脳出血」という３つのキーワードが抽出されたものとする。 Note that there may be one or more keywords extracted in step S104. Hereinafter, for convenience of explanation, it is assumed that three keywords “heart failure”, “headache”, and “cerebral hemorrhage” are extracted in step S104.

続くステップＳ１０５では、副作用判定・学習部１０８が、安全性情報報告文書２０５で報告されたうちで未処理の副作用が残っているか否かを判定する。つまり、副作用判定・学習部１０８は、ステップＳ１０４で副作用キーワード抽出部１０７から通知されたキーワードのすべてについてステップＳ１０６以降の処理を行ったか否かを判定する。 In subsequent step S 105, the side effect determination / learning unit 108 determines whether or not an unprocessed side effect remains among those reported in the safety information report document 205. That is, the side effect determination / learning unit 108 determines whether or not the processing after step S106 has been performed for all the keywords notified from the side effect keyword extraction unit 107 in step S104.

未処理の副作用が残っていれば、処理はステップＳ１０６に移行する。逆に、すべての副作用について処理済みであれば、処理はステップＳ１１７に移行する。なお、特に誤解のおそれがない場合には、説明の簡略化のため、「副作用を示すキーワード」のことを単に「副作用」ということもある。 If unprocessed side effects remain, the process proceeds to step S106. Conversely, if all the side effects have been processed, the process proceeds to step S117. If there is no possibility of misunderstanding, the “keyword indicating a side effect” may be simply referred to as “side effect” for the sake of simplification.

ステップＳ１０６で副作用判定・学習部１０８は、未処理の副作用（つまり未処理のキーワード）を１つ選ぶ。以下ではステップＳ１０６で選ばれた副作用を「選択副作用」という。例えば、上記のようにステップＳ１０４において「心不全」と「頭痛」と「脳出血」という３つのキーワードが抽出された場合、ステップＳ１０６の１回目の実行で副作用判定・学習部１０８は「心不全」を選んでもよい。 In step S106, the side effect determination / learning unit 108 selects one unprocessed side effect (that is, an unprocessed keyword). Hereinafter, the side effect selected in step S106 is referred to as “selected side effect”. For example, when the three keywords “heart failure”, “headache”, and “cerebral hemorrhage” are extracted in step S104 as described above, the side effect determination / learning unit 108 selects “heart failure” in the first execution of step S106. But you can.

次のステップＳ１０７で副作用判定・学習部１０８は、選択副作用が学習済みであるか否かを判断する。つまり、副作用判定・学習部１０８は、学習結果テーブル２０４においてＩＤが判定対象薬ＩＤ２０６に一致するエントリを検索し、見つかったエントリの既知副作用リストを取得する。 In next step S107, the side effect determination / learning unit 108 determines whether or not the selected side effect has been learned. That is, the side effect determination / learning unit 108 searches the learning result table 204 for an entry whose ID matches the determination target drug ID 206, and acquires a known side effect list of the found entry.

そして、取得した既知副作用リストの中に、選択副作用と一致するか、又は選択副作用の同義語と一致する要素があれば、副作用判定・学習部１０８は、「選択副作用は学習済みである」と判定する。逆に、取得した既知副作用リストの中に、選択副作用と一致する要素もなく、選択副作用の同義語と一致する要素もなければ、副作用判定・学習部１０８は、「選択副作用は学習済みではない」と判定する。 Then, if there is an element in the acquired known side effect list that matches the selected side effect or a synonym of the selected side effect, the side effect determination / learning unit 108 states that “the selected side effect has been learned”. judge. Conversely, if there is no element that matches the selected side effect in the acquired list of known side effects and no element that matches the synonym of the selected side effect, the side effect determination / learning unit 108 determines that “the selected side effect has not been learned. Is determined.

なお、副作用判定・学習部１０８は、選択副作用を検索キーにして同義語辞書２０３を検索することにより、選択副作用に同義語が存在するか否かを認識することができ、また、もし同義語が存在すれば、当該同義語を認識することもできる。 The side effect determination / learning unit 108 can recognize whether or not a synonym exists in the selected side effect by searching the synonym dictionary 203 using the selected side effect as a search key. Can be recognized as well.

ステップＳ１０７において、選択副作用が学習済みと判断された場合、処理はステップＳ１０８に移行する。逆に、選択副作用が学習済みではない場合、処理はステップＳ１０９に移行する。 If it is determined in step S107 that the selected side effect has been learned, the process proceeds to step S108. Conversely, if the selected side effect has not been learned, the process proceeds to step S109.

例えば、上記の例のようにステップＳ１０６で「心不全」が選ばれた場合、図４の学習結果テーブル２０４において判断対象薬に対応する上記（ａ５）のエントリの既知副作用リストには「心不全」という要素が含まれるので、選択副作用は学習済みである。よって、処理はステップＳ１０８に移行する。 For example, when “heart failure” is selected in step S106 as in the above example, the known side effect list of the entry (a5) corresponding to the determination target drug in the learning result table 204 of FIG. Since the element is included, the selective side effects have been learned. Therefore, the process proceeds to step S108.

逆に、ステップＳ１０６で「頭痛」又は「脳出血」が選ばれた場合、ステップＳ１０７では「選択副作用は学習済みではない」と判断される。なぜなら、上記（ａ５）のエントリの既知副作用リストには、「頭痛」又はその同義語の要素もなく、「脳出血」又はその同義語の要素もないからである。 Conversely, if “headache” or “cerebral hemorrhage” is selected in step S106, it is determined in step S107 that “the selected side effect has not been learned”. This is because there is no element of “headache” or its synonym and no element of “cerebral hemorrhage” or its synonym in the known side effect list of the entry (a5).

ステップＳ１０８で副作用判定・学習部１０８は、選択副作用が既知であることを出力する。そして、処理はステップＳ１０５に戻る。
ここで、ステップＳ１０８での出力について、図５を参照してさらに詳しく説明する。図５は、副作用判定結果画面の例を説明する図である。 In step S108, the side effect determination / learning unit 108 outputs that the selected side effect is known. Then, the process returns to step S105.
Here, the output in step S108 will be described in more detail with reference to FIG. FIG. 5 is a diagram illustrating an example of the side effect determination result screen.

図５の副作用判定結果画面４００は、判定対象薬を示す判定対象薬表示欄４０１と、図３の処理の結果を示す判定結果一覧表４０２を含む。処理の結果によっては、ある副作用が既知か未知かをユーザに判断させるためのラジオボタン４０３を判定結果一覧表４０２が含み、副作用判定結果画面４００がさらに学習ボタン４０４を含むこともある。ラジオボタン４０３と学習ボタン４０４についてはステップＳ１１８とともに後述するので、ここでは説明を省略する。 The side effect determination result screen 400 of FIG. 5 includes a determination target drug display column 401 indicating the determination target drug and a determination result list 402 indicating the result of the processing of FIG. Depending on the processing result, the determination result list 402 may include a radio button 403 for allowing the user to determine whether a side effect is known or unknown, and the side effect determination result screen 400 may further include a learning button 404. Since the radio button 403 and the learning button 404 will be described later together with step S118, description thereof will be omitted here.

本実施形態の副作用判定・学習部１０８は、図２の出力装置３０６に相当するディスプレイに、図５の副作用判定結果画面４００を出力する。
上記の図３に関する説明では省略したが、副作用判定・学習部１０８は、例えばステップＳ１０４で副作用キーワード抽出部１０７からキーワードの通知を受けたときに、判定対象薬表示欄４０１と、判定結果一覧表４０２のヘッダ行を出力してもよい。あるいは、副作用判定・学習部１０８は、ステップＳ１０８又はステップＳ１１６を初めて実行するときに、判定対象薬表示欄４０１とヘッダ行を出力してもよい。 The side effect determination / learning unit 108 of the present embodiment outputs the side effect determination result screen 400 of FIG. 5 on a display corresponding to the output device 306 of FIG.
Although omitted in the description regarding FIG. 3 above, the side effect determination / learning unit 108 receives the keyword notification from the side effect keyword extraction unit 107 in step S104, for example, the determination target medicine display column 401, and the determination result list The header line 402 may be output. Alternatively, the side effect determination / learning unit 108 may output the determination target medicine display column 401 and the header line when step S108 or step S116 is executed for the first time.

そして、上記の例のように選択副作用が「心不全」であり、ステップＳ１０７で選択副作用が学習済みと判明した場合には、副作用判定・学習部１０８は、ステップＳ１０８で図５の判定結果一覧表４０２の「心不全」の行をディスプレイに出力してもよい。 If the selected side effect is “heart failure” as in the above example and it is found that the selected side effect has been learned in step S107, the side effect determination / learning unit 108 displays the determination result table of FIG. 5 in step S108. A line of 402 “heart failure” may be output on the display.

図５の例では、判定対象薬表示欄４０１は、判定対象薬ＩＤを表示することで判定対象薬を示している。もちろん、実施形態によっては、副作用判定・学習部１０８は、例えば判定対象薬の販売名又は一般名を判定対象薬の添付文書２０２から読み出して、判定対象薬表示欄４０１に出力してもよい。 In the example of FIG. 5, the determination target drug display field 401 indicates the determination target drug by displaying the determination target drug ID. Of course, depending on the embodiment, the side effect determination / learning unit 108 may read, for example, the sales name or general name of the determination target drug from the attached document 202 of the determination target drug and output it to the determination target drug display field 401.

なお、本実施形態の副作用判定・学習部１０８は、判定対象薬表示欄４０１において判定対象薬ＩＤに、判定対象薬の添付文書２０２へのリンクを埋め込んでいる。よって、ユーザは、リンクをクリックすることで、判定対象薬の添付文書２０２を容易に特定して参照することができ、判定装置１００による判定結果の妥当性を確認することができる。 Note that the side effect determination / learning unit 108 of the present embodiment embeds a link to the attached document 202 of the determination target drug in the determination target drug ID in the determination target drug display field 401. Therefore, the user can easily identify and refer to the attached document 202 of the determination target drug by clicking the link, and can confirm the validity of the determination result by the determination apparatus 100.

また、図５に例示した判定結果一覧表４０２は、ヘッダ行と、安全性情報報告文書２０５から抽出された個々の副作用のキーワードにそれぞれ対応する行を含む。図５の例は、図３のステップＳ１０４で「心不全」と「頭痛」と「脳出血」という３つのキーワードが抽出された場合の例なので、ヘッダ行の下にこれら３つのキーワードに対応する３つの行がある。 Further, the determination result list 402 illustrated in FIG. 5 includes a header line and a line corresponding to each side effect keyword extracted from the safety information report document 205. The example of FIG. 5 is an example in the case where three keywords “heart failure”, “headache”, and “cerebral hemorrhage” are extracted in step S104 of FIG. 3, so that three keywords corresponding to these three keywords are displayed below the header line. There is a line.

判定結果一覧表４０２のヘッダ行は、「副作用」、「学習結果による判定」、「類薬との比較による判定」及び「判断入力欄」という見出しを含む。そして、図４に例示したように、ＩＤが「９９８８７７Ｆ５０５０」の医薬品の類薬リストは（１１１２２２Ａ３３３３，４４４５５５Ａ７７７７，７７７８８８Ｃ９０９０）である。よって、「類薬との比較による判定」の見出しのもとには、これら３つの類薬に対応する３つの列があり、ヘッダ行にはこれら３つの類薬それぞれのＩＤが表示されている。 The header row of the determination result list 402 includes the headings “side effects”, “determination based on learning results”, “determination based on comparison with similar drugs”, and “determination input field”. Then, as illustrated in FIG. 4, the similar drug list of the medicine whose ID is “998877F5050” is (111222A3333,444555A7777,777788C9090). Therefore, under the heading “Judgment by comparison with similar drugs”, there are three columns corresponding to these three similar drugs, and the ID of each of these three similar drugs is displayed in the header row. .

なお、本実施形態の副作用判定・学習部１０８は、３つの類薬のＩＤのそれぞれに、当該類薬の添付文書２０２へのリンクを埋め込んでいる。よって、ユーザは、リンクをクリックすることで、類薬の添付文書２０２を容易に特定して参照することができ、判定装置１００による判定結果の妥当性を確認することができる。 Note that the side effect determination / learning unit 108 of the present embodiment embeds a link to the attached document 202 of each similar drug in each of the three similar drug IDs. Therefore, the user can easily identify and refer to the attached document 202 of the similar medicine by clicking the link, and can confirm the validity of the determination result by the determination apparatus 100.

そして、図５の判定結果一覧表４０２の１行目は、図３のステップＳ１０６で「心不全」が選択副作用として選ばれた場合に出力される行である。この場合、副作用判定・学習部１０８は、選択副作用が既知の副作用として学習済みであるとステップＳ１０７で判断する。よって、副作用判定・学習部１０８は、ステップＳ１０８において、判定結果一覧表４０２の「副作用」の列に選択副作用を示す「心不全」というキーワードを出力し、「学習結果による判定」の列に「既知」と出力する。 The first line of the determination result list 402 in FIG. 5 is a line that is output when “heart failure” is selected as the selected side effect in step S106 in FIG. In this case, the side effect determination / learning unit 108 determines in step S107 that the selected side effect has been learned as a known side effect. Therefore, in step S108, the side effect determination / learning unit 108 outputs the keyword “heart failure” indicating the selected side effect in the column “side effect” of the determination result list 402, and “known” in the column “determination based on learning result”. Is output.

また、この場合、類薬との比較を行う必要はない。そこで、本実施形態の副作用判定・学習部１０８は、ステップＳ１０８において、「類薬との比較による判定」の見出しのもとにある３つの列には、類薬との比較を行わないことを示す「−」という文字を出力している。 In this case, it is not necessary to make a comparison with similar drugs. Therefore, in step S108, the side effect determination / learning unit 108 of this embodiment does not perform comparison with similar drugs in the three columns under the heading “Determination by comparison with similar drugs”. The character “-” is output.

同様に、この場合、ユーザの判断を仰ぐ必要もない。そこで、本実施形態の副作用判定・学習部１０８は、ステップＳ１０８において、「判断入力欄」の列にも「−」という文字を出力している。 Similarly, in this case, there is no need to ask the user for judgment. Therefore, the side effect determination / learning unit 108 of the present embodiment outputs the character “-” in the column of “determination input field” in step S108.

ここで図３の説明に戻る。副作用判定・学習部１０８は、ステップＳ１０７で「選択副作用は学習済みではない」と判断した場合、ステップＳ１０９で、「既知副作用学習リスト」と「既知副作用候補リスト」を空に初期化する。既知副作用学習リストは、現在注目している選択副作用との類似度が高い副作用が添付文書２０２に記載されている類薬のＩＤを要素として含むリストである。また、既知副作用候補リストは、現在注目している選択副作用との類似度が中程度の副作用が添付文書２０２に記載されている類薬のＩＤを要素として含むリストである。既知副作用学習リストと既知副作用候補リストの初期化の後、処理はステップＳ１１０に移行する。 Returning to the description of FIG. If the side effect determination / learning unit 108 determines in step S107 that “the selected side effect has not been learned”, it initializes the “known side effect learning list” and “known side effect candidate list” to empty in step S109. The known side effect learning list is a list including, as an element, the ID of a similar drug whose side effect having a high degree of similarity with the currently selected selected side effect is described in the package insert 202. In addition, the known side effect candidate list is a list including, as an element, the ID of the similar drug in which the side effect having a medium similarity to the currently selected selected side effect is described in the package insert 202. After the initialization of the known side effect learning list and the known side effect candidate list, the process proceeds to step S110.

ステップＳ１１０で副作用判定・学習部１０８は、判定対象薬の類薬のうちで、現在注目している選択副作用に関して未処理のものが残っているか否かを判断する。すなわち、副作用判定・学習部１０８は、ステップＳ１０２で類薬認識部１０５から通知された類薬リストに含まれるＩＤのうち、現在注目している選択副作用に関してステップＳ１１１以降の処理を行っていないＩＤが残っているか否かを判断する。 In step S 110, the side effect determination / learning unit 108 determines whether there are any unprocessed selected side effects that are currently focused on among the similar drugs to be determined. That is, the side effect determination / learning unit 108 does not perform the processing after step S111 regarding the selected side effect currently focused on among the IDs included in the analogy drug list notified from the analogy drug recognition unit 105 in step S102. Whether or not remains.

もし、現在注目している選択副作用に関して未処理の類薬が残っていれば、処理はステップＳ１１１に移行する。逆に、現在注目している選択副作用に関しては、ステップＳ１０２で通知された類薬リストに含まれるすべてのＩＤについて処理済みであれば、処理はステップＳ１１６に移行する。 If there are any unprocessed analogs for the currently selected selective side effect, the process proceeds to step S111. On the other hand, regarding the selected side effect that is currently focused on, if all the IDs included in the similar medicine list notified in step S102 have been processed, the process proceeds to step S116.

ステップＳ１１１で副作用判定・学習部１０８は、判定対象薬の類薬のうちで、現在注目している選択副作用に関して未処理の類薬を１つ選ぶ。以下、ステップＳ１１１で選択された類薬を「選択類薬」といい、選択類薬のＩＤを「選択類薬ＩＤ」という。 In step S111, the side effect determination / learning unit 108 selects one unprocessed analog for the selected side effect that is currently focused on from among the analogs of the determination target drug. Hereinafter, the similar drug selected in step S111 is referred to as “selected drug”, and the ID of the selected drug is referred to as “selected drug ID”.

続いて、ステップＳ１１２で副作用判定・学習部１０８は、ステップＳ１０６で選んだ選択副作用と、ステップＳ１１１で選んだ選択類薬の添付文書２０２の「副作用」セクションのキーワード群との類似度を求める。ステップＳ１１２の処理の詳細は、「副作用類似度算出処理」として図６とともに後述するが、概要は次のとおりである。 Subsequently, in step S112, the side effect determination / learning unit 108 obtains the similarity between the selected side effect selected in step S106 and the keyword group in the “side effect” section of the attached drug 202 of the selected similar drug selected in step S111. Details of the processing in step S112 will be described later with reference to FIG. 6 as “side effect similarity calculation processing”, but the outline is as follows.

すなわち、副作用判定・学習部１０８は、選択類薬の添付文書２０２の「副作用」セクションから抽出された副作用キーワード群を学習結果テーブル２０４から取得する。そして、副作用判定・学習部１０８は、取得した副作用キーワード群に含まれる各キーワードについて、選択副作用との類似度をキーワード類似度評価部１０９に評価させる。 That is, the side effect determination / learning unit 108 acquires from the learning result table 204 a side effect keyword group extracted from the “side effect” section of the attached document 202 of the selected analogy drug. Then, the side effect determination / learning unit 108 causes the keyword similarity evaluation unit 109 to evaluate the similarity to the selected side effect for each keyword included in the acquired side effect keyword group.

そして、副作用判定・学習部１０８は、評価の結果を集計して類似度を算出する。なお、本実施形態では、類似度が高いほど、算出される値も大きい。類似度の算出後、処理はステップＳ１１３に移行する。 Then, the side effect determination / learning unit 108 calculates the similarity by totaling the evaluation results. In the present embodiment, the higher the similarity, the larger the calculated value. After calculating the similarity, the process proceeds to step S113.

すると、ステップＳ１１３で副作用判定・学習部１０８は、ステップＳ１１２で求めた類似度が、「α_１以上」、「α_２以上α_１未満」、「α_２未満」のうちどの範囲に該当するかを判断する。なお、本実施形態においてα_１とα_２は、予め決められた適宜の閾値であり、α_１＞α_２である。 Then, in step S113, the side effect determination / learning unit 108 corresponds to which range the similarity obtained in step S112 falls within “α ₁ or more”, “α ₂ or more and less than α ₁ ”, or “less than α ₂ ”. Judging. In the present embodiment, α ₁ and α ₂ are predetermined threshold values, and α ₁ > α ₂ .

なお、本実施形態における閾値α_１とα_２は固定された値だが、実施形態によっては、閾値α_１とα_２は選択副作用の長さに応じて変化するように決められた値であってもよい。また、このステップＳ１１３の例に限らず、閾値との比較は、実施形態により「閾値を超えるか、それとも閾値以下か」という比較でもよいし、「閾値以上か、それとも閾値未満か」という比較でもよく、適宜方針を定めることができる。 Although the threshold values α ₁ and α ₂ in the present embodiment are fixed values, the threshold values α ₁ and α ₂ are values determined so as to change according to the length of the selected side effect depending on the embodiment. Also good. Further, the comparison with the threshold value is not limited to the example of step S113, and the comparison with the threshold value may be a comparison “whether it exceeds the threshold value or less than the threshold value” or a comparison “whether it is more than the threshold value or less than the threshold value”. Well, it is possible to establish a policy as appropriate.

閾値α_１は、選択副作用を「既知の副作用」と判断するのが妥当であることを示す基準値である。また、閾値α_２は、選択副作用を「未知の副作用」として判断するか、「既知の可能性がある副作用」と判断するかの境界を示す基準値である。 The threshold value α ₁ is a reference value indicating that it is appropriate to determine that the selected side effect is a “known side effect”. The threshold α ₂ is a reference value indicating a boundary between determining that the selected side effect is “unknown side effect” or “side effect that may be known”.

ステップＳ１１２で求めた類似度がα_１以上の場合、処理はステップＳ１１３からステップＳ１１４に移行する。また、ステップＳ１１２で求めた類似度がα_２以上α_１未満の場合、処理はステップＳ１１３からステップＳ１１５に移行する。そして、ステップＳ１１２で求めた類似度がα_２未満の場合、処理はステップＳ１１３からステップＳ１１０に戻る。 When the similarity is _one or more α obtained in step S112, the processing proceeds from step S113 to step S114. Also, similarity obtained in step S112 is a case of _two or more alpha less than ₁ alpha, the process proceeds from step S113 to step S115. When the similarity obtained in step S112 is less than alpha _2, the process returns from step S113 to step S110.

なお、ステップＳ１１２の上記概要説明のとおり、副作用判定・学習部１０８は、類薬として認識された医薬品の添付文書における副作用の記載部分に含まれる語句の集合を、比較対象語句集合として取得する比較対象集合取得手段の一例である。比較対象集合取得手段としての副作用判定・学習部１０８は、具体的には、格納部１０１の学習結果テーブル２０４から、類薬として認識された医薬品に関する副作用学習結果情報を読み出すことにより、比較対象語句集合を取得する。類薬として認識された医薬品に関する副作用学習結果情報は、類薬として認識された医薬品の添付文書２０２における副作用の記載部分からの語句抽出処理により予め得られた語句の集合を、類薬として認識された医薬品の識別情報と関連付けるエントリに相当する。 Note that, as described in the outline of step S112, the side effect determination / learning unit 108 obtains a set of words / phrases included in the side effect description part in the attached document of the drug recognized as a similar drug as a comparison target word / phrase set. It is an example of object set acquisition means. Specifically, the side effect determination / learning unit 108 as the comparison target set acquisition unit reads the side effect learning result information on the medicine recognized as a similar drug from the learning result table 204 of the storage unit 101, thereby comparing the target phrase Get a set. The side effect learning result information related to a drug recognized as a similar drug is a set of words and phrases obtained in advance by word extraction processing from the side effect description part in the package insert 202 of the drug recognized as a similar drug. This corresponds to the entry associated with the identification information of the medicine.

また、ステップＳ１１２とＳ１１３に関する上記説明から明らかなとおり、副作用判定・学習部１０８は、判定対象副作用語句が示す副作用が類薬において既知の副作用か否かを判定する判定手段の一例でもある。 As is clear from the above description regarding steps S112 and S113, the side effect determination / learning unit 108 is also an example of a determination unit that determines whether or not the side effect indicated by the determination target side effect phrase is a known side effect in a similar drug.

すなわち、副作用判定・学習部１０８は、類薬の少なくとも一部について、それぞれ、当該類薬に関して取得された比較対象語句集合に含まれる語句と判定対象副作用語句との組み合わせを、キーワード類似度評価部１０９に評価させる。換言すれば、副作用判定・学習部１０８は、類薬の少なくとも一部について、それぞれ、当該類薬に関して取得された副作用キーワード群中のキーワードと選択副作用との組み合わせを、キーワード類似度評価部１０９に評価させる。そして、副作用判定・学習部１０８は、キーワード類似度評価部１０９による評価の結果と閾値α_１とを用いて、判定対象副作用語句が示す副作用が当該類薬において既知の副作用か否かを判定する。 That is, the side effect determination / learning unit 108 determines a combination of a phrase and a determination target side effect phrase included in the set of comparison target phrases acquired for the analog for at least a part of the similar drug, as a keyword similarity evaluation unit. 109. In other words, the side effect determination / learning unit 108 sets, for at least a part of the similar drug, the keyword similarity evaluation unit 109 for each keyword combination in the side effect keyword group acquired for the similar drug and the selected side effect. Let me evaluate. Then, the side effect determination / learning unit 108 uses the evaluation result by the keyword similarity evaluation unit 109 and the threshold value α ₁ to determine whether the side effect indicated by the determination target side effect phrase is a known side effect in the related drug. .

ステップＳ１１４で副作用判定・学習部１０８は、選択類薬ＩＤを既知副作用学習リストに追加する。つまり、副作用判定・学習部１０８は、「選択副作用が選択類薬において既知の副作用である」ということを記憶する。そして、処理はステップＳ１１０に戻る。 In step S114, the side effect determination / learning unit 108 adds the selected drug ID to the known side effect learning list. That is, the side effect determination / learning unit 108 stores that “the selected side effect is a known side effect in the selected drug”. Then, the process returns to step S110.

また、ステップＳ１１５で副作用判定・学習部１０８は、選択類薬ＩＤを既知副作用候補リストに追加する。つまり、副作用判定・学習部１０８は、「選択副作用は、選択類薬において既知の副作用とある程度は類似しているので、もしかすると既知の副作用かもしれない」ということを記憶する。そして、処理はステップＳ１１０に戻る。 In step S115, the side effect determination / learning unit 108 adds the selected drug ID to the known side effect candidate list. That is, the side effect determination / learning unit 108 stores that “the selected side effect is somewhat similar to the known side effect in the selected drug, and may be a known side effect”. Then, the process returns to step S110.

以上のようにして、副作用判定・学習部１０８が、現在注目している選択副作用に関して、各類薬についてステップＳ１１１以降の処理を実行することで、現在注目している選択副作用に関する既知副作用学習リストと既知副作用候補リストが完成する。そして、既知副作用学習リストと既知副作用候補リストが完成すると、上記のとおり処理はステップＳ１１０からステップＳ１１６へと移行する。 As described above, the side effect determination / learning unit 108 executes the processes after step S111 for each similar drug with respect to the currently selected selective side effect, so that the known side effect learning list regarding the currently selected selected side effect is listed. The list of known side effect candidates is completed. When the known side effect learning list and the known side effect candidate list are completed, the process proceeds from step S110 to step S116 as described above.

ステップＳ１１６で副作用判定・学習部１０８は、選択副作用について結果を出力し、選択副作用が既知か否かを既知副作用学習リストにしたがって学習する。図５を再び参照してステップＳ１１６の動作について具体的に２つの例を挙げて説明すれば、次のとおりである。 In step S116, the side effect determination / learning unit 108 outputs a result regarding the selected side effect, and learns whether the selected side effect is known according to the known side effect learning list. With reference to FIG. 5 again, the operation of step S116 will be specifically described with reference to two examples.

第１の例として、ステップＳ１０６で選択副作用として「頭痛」が選択された場合を説明する。また、上記のとおり、判定対象薬ＩＤ２０６は「９９８８７７Ｆ５０５０」であり、学習結果テーブル２０４は図４に示すとおりであるとする。 As a first example, a case where “headache” is selected as the selected side effect in step S106 will be described. Further, as described above, it is assumed that the determination target drug ID 206 is “998877F5050” and the learning result table 204 is as shown in FIG.

すると、選択類薬ＩＤが「１１１２２２Ａ３３３３」のとき、上記（ａ１）で説明した図４のエントリの副作用キーワード群の中に、選択副作用（すなわち「頭痛」）と一致するキーワードがある。よって、閾値α_１が適切に定められていれば、「１１１２２２Ａ３３３３」という選択類薬ＩＤが既知副作用学習リストに追加される。 Then, when the selected drug ID is “111222A3333”, there is a keyword that matches the selected side effect (ie, “headache”) in the side effect keyword group of the entry of FIG. 4 described in (a1) above. Therefore, if the threshold value α ₁ is if properly defined, selected such drugs ID of "111222A3333" is added to the known side effects learning list.

他方、上記（ａ３）と（ａ４）で説明した図４のエントリの副作用キーワード群の中には、「頭痛」と類似するキーワードがない。よって、閾値α_１とα_２が適切に定められていれば、選択類薬ＩＤが「４４４５５５Ａ７７７７」又は「７７７８８８Ｃ９０９０」のとき、選択類薬ＩＤは既知副作用学習リストにも既知副作用候補リストにも追加されない。 On the other hand, there is no keyword similar to “headache” in the side effect keyword group of the entry of FIG. 4 described in (a3) and (a4) above. Therefore, if the thresholds α ₁ and α ₂ are appropriately determined, when the selected drug ID is “444555A7777” or “777888C9090”, the selected drug ID is added to both the known side effect learning list and the known side effect candidate list. Not.

よって、この第１の例においては、ステップＳ１１６で副作用判定・学習部１０８は、「頭痛」という選択副作用についての判定結果として、以下のように図５の判定結果一覧表４０２の２行目の出力を行う。 Therefore, in this first example, in step S116, the side effect determination / learning unit 108 determines the second side of the determination result list 402 in FIG. 5 as the determination result for the selected side effect “headache” as follows. Output.

すなわち、副作用判定・学習部１０８は、「副作用」の列に選択副作用を示す「頭痛」というキーワードを出力し、「学習結果による判定」の列に「未知」と出力する。そして、「１１１２２２Ａ３３３３」というＩＤが既知副作用学習リストに含まれることから、副作用判定・学習部１０８は、「類薬との比較による判定」として、「１１１２２２Ａ３３３３」というＩＤの列に「既知」と出力する。 That is, the side effect determination / learning unit 108 outputs the keyword “headache” indicating the selected side effect in the “side effect” column, and outputs “unknown” in the “determination based on learning result” column. Since the ID “111222A3333” is included in the known side effect learning list, the side effect determination / learning unit 108 outputs “known” to the ID column “111222A3333” as “determination based on comparison with similar drugs”. To do.

他方、「４４４５５５Ａ７７７７」と「７７７８８８Ｃ９０９０」というＩＤは既知副作用学習リストにも既知副作用候補リストにも含まれない。よって、副作用判定・学習部１０８は、「類薬との比較による判定」として、これら２つのＩＤの列には「未知」と出力する。 On the other hand, the IDs “444555A7777” and “777888C9090” are not included in the known side effect learning list or the known side effect candidate list. Therefore, the side effect determination / learning unit 108 outputs “unknown” to these two ID columns as “determination based on comparison with similar drugs”.

また、この第１の例では、既知副作用学習リストが空ではないので、選択副作用は既知の副作用として判定されたことになる。したがって、ユーザからの指示は不要である。そこで、副作用判定・学習部１０８は、判断不要を示す「−」を、「判断入力欄」列に出力する。 In the first example, since the known side effect learning list is not empty, the selected side effect is determined as a known side effect. Therefore, no instruction from the user is required. Therefore, the side effect determination / learning unit 108 outputs “-” indicating that the determination is unnecessary to the “determination input column” column.

続いて、図３のステップＳ１１６についての第２の例として、ステップＳ１０６で選択副作用として「脳出血」が選択された場合を説明する。なお、上記のとおり、判定対象薬ＩＤ２０６は「９９８８７７Ｆ５０５０」であり、学習結果テーブル２０４は図４に示すとおりであるとする。 Subsequently, a case where “cerebral hemorrhage” is selected as a selected side effect in step S106 will be described as a second example of step S116 in FIG. As described above, the determination target drug ID 206 is “998877F5050”, and the learning result table 204 is as shown in FIG.

すると、選択類薬ＩＤが「１１１２２２Ａ３３３３」のとき、上記（ａ１）で説明した図４のエントリの副作用キーワード群の中に、「脳出血」と中程度に類似する（つまり「脳出血」と部分的に一致する）「肺出血」というキーワードが見つかる。また、選択類薬ＩＤが「４４４５５５Ａ７７７７」の場合も、同様に、選択類薬の副作用キーワード群の中に、「肺出血」というキーワードが見つかる。よって、閾値α_１とα_２が適切に定められていれば、「１１１２２２Ａ３３３３」と「４４４５５５Ａ７７７７」というＩＤが既知副作用候補リストに追加される。 Then, when the selected drug ID is “111222A3333”, the side effect keyword group of the entry of FIG. The keyword “pulmonary hemorrhage” is found. Similarly, when the selected analog ID is “444555A7777”, the keyword “pulmonary hemorrhage” is found in the side effect keyword group of the selected analog. Therefore, if the threshold values α ₁ and α ₂ are appropriately determined, IDs “111222A3333” and “444555A7777” are added to the known side effect candidate list.

他方、選択類薬ＩＤが「７７７８８８Ｃ９０９０」のとき、選択類薬ＩＤは既知副作用学習リストにも既知副作用候補リストにも追加されない。なぜなら、当該選択類薬に対応する図４のエントリ（つまり上記（ａ４）で説明したエントリ）の副作用キーワード群には、「脳出血」との類似度が閾値α_２に満たないキーワードしか含まれていないためである。 On the other hand, when the selected drug ID is “777888C9090”, the selected drug ID is not added to the known side effect learning list or the known side effect candidate list. This is because the side effects keyword group entry 4 corresponding to the selected class drugs (i.e. entry described above (a4)) is contained only keywords similarity to the "cerebral hemorrhage" is less than the threshold value alpha ₂ This is because there is not.

よって、この第２の例においては、図３のステップＳ１１６で副作用判定・学習部１０８は、「脳出血」という選択副作用についての判定結果として、以下のように図５の判定結果一覧表４０２の３行目の出力を行う。 Therefore, in this second example, in step S116 of FIG. 3, the side effect determination / learning unit 108 sets 3 of the determination result list 402 of FIG. 5 as the determination result for the selected side effect “cerebral hemorrhage” as follows. Output the line.

すなわち、副作用判定・学習部１０８は、「副作用」の列に選択副作用を示す「脳出血」というキーワードを出力し、「学習結果による判定」の列に「未知」と出力する。そして、「１１１２２２Ａ３３３３」というＩＤが既知副作用候補リストに含まれることから、副作用判定・学習部１０８は、「類薬との比較による判定」として、「１１１２２２Ａ３３３３」というＩＤの列に「既知候補」と出力する。同様に、副作用判定・学習部１０８は、「４４４５５５Ａ７７７７」というＩＤの列にも「既知候補」と出力する。 That is, the side effect determination / learning unit 108 outputs the keyword “cerebral hemorrhage” indicating the selected side effect in the “side effect” column, and outputs “unknown” in the “determination based on learning result” column. Since the ID “111222A3333” is included in the known side effect candidate list, the side effect determination / learning unit 108 sets “known candidate” in the ID column “111222A3333” as “determination by comparison with similar drugs”. Output. Similarly, the side effect determination / learning unit 108 outputs “known candidate” to the ID column “444555A7777”.

また、この第２の例では、既知副作用学習リストが空なので、副作用判定・学習部１０８は「選択副作用は既知の副作用である」と断定することができない。他方で、既知副作用候補リストが空ではないので、副作用判定・学習部１０８は「選択副作用は未知の副作用である」とも断定することができない。 In the second example, since the known side effect learning list is empty, the side effect determination / learning unit 108 cannot determine that “the selected side effect is a known side effect”. On the other hand, since the known side effect candidate list is not empty, the side effect determination / learning unit 108 cannot determine that “the selected side effect is an unknown side effect”.

そこで、副作用判定・学習部１０８は、選択副作用が既知の副作用か否かをユーザに判断させるために、現在注目している選択副作用の行において、「判断入力欄」列に、「既知」と「未知」の２択用のラジオボタン４０３を表示する。もちろん、実施形態によっては、ユーザからの入力を受け付けるためのユーザインタフェースとして、ラジオボタン以外のもの（例えばチェックボックスやプルダウンリストなど）が使われてもよい。 Therefore, the side effect determination / learning unit 108 sets “known” in the “determination input field” column in the row of the selected side effect to which attention is currently given in order to make the user determine whether the selected side effect is a known side effect. A radio button 403 for selecting “Unknown” is displayed. Of course, depending on the embodiment, a user interface other than a radio button (for example, a check box or a pull-down list) may be used as a user interface for receiving input from the user.

以上、第１と第２の例を用いて説明したように、副作用判定・学習部１０８はステップＳ１１６において、各類薬について、当該類薬のＩＤが既知副作用学習リストに含まれていれば「既知」と出力し、既知副作用候補リストに含まれていれば「既知候補」と出力する。もし、当該類薬のＩＤが既知副作用学習リストにも既知副作用候補リストにも含まれていなければ、副作用判定・学習部１０８は「未知」と出力する。 As described above, as described with reference to the first and second examples, the side effect determination / learning unit 108 determines in step S116 that, for each analog, the ID of the analog is included in the known side effect learning list. “Known” is output, and if it is included in the known side effect candidate list, “Known candidate” is output. If the ID of the related drug is not included in the known side effect learning list or the known side effect candidate list, the side effect determination / learning unit 108 outputs “unknown”.

そして、副作用判定・学習部１０８は、既知副作用候補リストが空ではなく、かつ既知副作用学習リストが空のときに、「判断入力欄」列にラジオボタン４０３を表示する。それ以外の場合は、副作用判定・学習部１０８が「選択副作用は既知の副作用である」又は「選択副作用は未知の副作用である」と断定することができたということなので、副作用判定・学習部１０８はラジオボタン４０３を表示しない。 Then, when the known side effect candidate list is not empty and the known side effect learning list is empty, the side effect determination / learning unit 108 displays a radio button 403 in the “determination input field” column. In other cases, the side effect determination / learning unit 108 can determine that “the selected side effect is a known side effect” or “the selected side effect is an unknown side effect”. 108 does not display the radio button 403.

また、ステップＳ１１６で副作用判定・学習部１０８はさらに、選択副作用が既知か否かの判定結果を学習する。具体的には、副作用判定・学習部１０８は、既知副作用学習リストが空か否かを判定し、既知副作用学習リストが空でなければ、学習結果テーブル２０４において判定対象薬ＩＤ２０６をＩＤとして有するエントリの既知副作用リストに、選択副作用を追加する。なお、既知副作用学習リストが空の場合は、副作用判定・学習部１０８は学習結果テーブル２０４の既知副作用リストの更新を行わない。 In step S116, the side effect determination / learning unit 108 further learns the determination result of whether or not the selected side effect is known. Specifically, the side effect determination / learning unit 108 determines whether or not the known side effect learning list is empty, and if the known side effect learning list is not empty, the entry having the determination target drug ID 206 as an ID in the learning result table 204 Add the selected side effect to the list of known side effects. When the known side effect learning list is empty, the side effect determination / learning unit 108 does not update the known side effect list in the learning result table 204.

以上のようにしてステップＳ１１６において、判定結果一覧表４０２の選択副作用の行の出力と必要に応じた学習結果テーブル２０４の学習が行われると、処理はステップＳ１０５に戻る。 As described above, when the output of the selected side effect row in the determination result list 402 and the learning result table 204 are learned as necessary in step S116, the process returns to step S105.

そして、ステップＳ１０５において、安全性情報報告文書２０５で報告された副作用についてすべて処理済みであると判断すると、続いて副作用判定・学習部１０８はステップＳ１１７の処理を実行する。すなわち、ステップＳ１１７で副作用判定・学習部１０８は、ユーザによる判断が必要か否かを判断する。 If it is determined in step S105 that all the side effects reported in the safety information report document 205 have been processed, then the side effect determination / learning unit 108 executes the process of step S117. That is, in step S117, the side effect determination / learning unit 108 determines whether determination by the user is necessary.

例えば、副作用判定・学習部１０８は、ステップＳ１１６においてラジオボタン４０３を表示したことが１回でもあったか否かを記憶しておいてもよい。そして、ラジオボタン４０３を表示したことがあれば、副作用判定・学習部１０８は、「ユーザによる判断が必要である」とステップＳ１１７で判断してもよい。 For example, the side effect determination / learning unit 108 may store whether or not the radio button 403 is displayed once in step S116. If the radio button 403 has been displayed, the side effect determination / learning unit 108 may determine in step S117 that “the user needs to make a determination”.

ユーザによる判断が不要の場合とは、すなわち、安全性情報報告文書２０５から抽出されたすべての副作用について、既知又は未知と副作用判定・学習部１０８が断定することができた場合である。この場合、図３の処理も終了する。他方、「ユーザによる判断が必要である」と副作用判定・学習部１０８が判断した場合は、処理はステップＳ１１８に移行する。 The case where determination by the user is unnecessary is a case where the side effect determination / learning unit 108 can determine that all side effects extracted from the safety information report document 205 are known or unknown. In this case, the process of FIG. 3 is also terminated. On the other hand, if the side effect determination / learning unit 108 determines that “the user needs to make a determination”, the process proceeds to step S118.

そして、ステップＳ１１８で副作用判定・学習部１０８は、既知副作用の候補に関するユーザからの入力を受け付け、入力内容を学習する。 In step S118, the side effect determination / learning unit 108 receives input from the user regarding the known side effect candidate, and learns the input content.

ここで図５を再び参照して具体的に説明すると、副作用判定・学習部１０８は、ステップＳ１１８において、学習ボタン４０４を副作用判定結果画面４００に表示し、学習ボタン４０４が押下されるまで待機する。なお、図５の例では、学習ボタン４０４に「既知候補の副作用について入力内容を学習」と書かれている。そして、ステップＳ１１８で学習ボタン４０４が押下されると、副作用判定・学習部１０８は、ラジオボタン４０３を介してユーザから入力された内容を学習する。 Specifically, referring again to FIG. 5, the side effect determination / learning unit 108 displays the learning button 404 on the side effect determination result screen 400 in step S118, and waits until the learning button 404 is pressed. . In the example of FIG. 5, the learning button 404 is written as “learn input content about known candidate side effects”. When the learning button 404 is pressed in step S118, the side effect determination / learning unit 108 learns the content input from the user via the radio button 403.

上記のように、ラジオボタン４０３が表示される副作用は、副作用判定・学習部１０８が「既知の副作用と断定することはできないが、既知の副作用の可能性がある」と判断した副作用である。よって、ユーザが、当該副作用が既知か未知かを判断し、ラジオボタン４０３を介して判断の結果を入力し、学習ボタン４０４を押下して入力の確定を行うと、副作用判定・学習部１０８はステップＳ１１８で次のように動作する。 As described above, the side effect on which the radio button 403 is displayed is a side effect determined by the side effect determination / learning unit 108 as “a known side effect cannot be determined, but a known side effect is possible”. Therefore, when the user determines whether the side effect is known or unknown, inputs the determination result via the radio button 403, and presses the learning button 404 to confirm the input, the side effect determination / learning unit 108 In step S118, the following operation is performed.

すなわち、副作用判定・学習部１０８は、ラジオボタン４０３を表示した各副作用について、当該副作用の行のラジオボタン４０３で「既知」と指示された場合、当該副作用を学習結果テーブル２０４に登録する。つまり、副作用判定・学習部１０８は、学習結果テーブル２０４においてＩＤが判定対象薬ＩＤ２０６と一致するエントリの既知副作用リストに、ラジオボタン４０３で「既知」と指示された当該副作用を追加する。なお、ラジオボタン４０３で「未知」と指示された場合は、副作用判定・学習部１０８は当該副作用については特に何もしない。 That is, for each side effect for which the radio button 403 is displayed, the side effect determination / learning unit 108 registers the side effect in the learning result table 204 when the radio button 403 in the side of the side effect indicates “known”. That is, the side effect determination / learning unit 108 adds the side effect indicated as “known” by the radio button 403 to the known side effect list of the entry whose ID matches the determination target drug ID 206 in the learning result table 204. When the radio button 403 instructs “unknown”, the side effect determination / learning unit 108 does nothing particularly about the side effect.

例えば、図５の例の場合、安全性情報報告文書２０５から抽出された「脳出血」というキーワードに関して、ラジオボタン４０３が表示されている。ここで、仮に、ラジオボタン４０３で「既知」と指示されて学習ボタン４０４が押下されたとする。すると、副作用判定・学習部１０８は、図４の学習結果テーブル２０４においてＩＤが「９９８８７７Ｆ５０５０」であるエントリの既知副作用リストに「脳出血」というキーワードを追加する。逆に、ラジオボタン４０３で「未知」と指示されて学習ボタン４０４が押下されたとすると、副作用判定・学習部１０８は、図４の学習結果テーブル２０４においてＩＤが「９９８８７７Ｆ５０５０」であるエントリの既知副作用リストの更新を行わない。 For example, in the case of the example in FIG. 5, a radio button 403 is displayed for the keyword “cerebral hemorrhage” extracted from the safety information report document 205. Here, it is assumed that the radio button 403 indicates “known” and the learning button 404 is pressed. Then, the side effect determination / learning unit 108 adds the keyword “cerebral hemorrhage” to the known side effect list of the entry whose ID is “998877F5050” in the learning result table 204 of FIG. On the other hand, if the radio button 403 indicates “unknown” and the learning button 404 is pressed, the side effect determination / learning unit 108 knows the known side effect of the entry whose ID is “998877F5050” in the learning result table 204 of FIG. Do not update the list.

以上のようにしてステップＳ１１８が終了すると、図３の処理も終了する。
続いて、図３のステップＳ１１２で実行される副作用類似度算出処理の詳細について、図６〜１０を参照して説明する。 When step S118 ends as described above, the processing in FIG. 3 also ends.
Next, details of the side effect similarity calculation process executed in step S112 of FIG. 3 will be described with reference to FIGS.

図６は、副作用類似度算出処理のフローチャートである。図３のステップＳ１１２に関して述べたように、副作用類似度算出処理は、ある選択副作用と選択類薬ＩＤの組み合わせに対して類似度を求める処理である。 FIG. 6 is a flowchart of the side effect similarity calculation process. As described with reference to step S112 in FIG. 3, the side effect similarity calculation process is a process for obtaining a similarity for a combination of a certain selected side effect and a selected drug ID.

ステップＳ２０１で副作用判定・学習部１０８は、３つの変数Ｐ_１、Ｐ_２、Ｐ_３を初期化してＮＵＬＬとする。３つの変数Ｐ_１、Ｐ_２、Ｐ_３は、選択類薬の副作用キーワード群の中で選択副作用との類似度が一定の基準を満たすキーワードのうちで上位３位に入るものの類似度の点数を記憶するための変数である。 In step S201, the side effect determination / learning unit 108 initializes the _three variables P ₁ , P ₂ , and P ₃ to NULL. The three variables P ₁ , P ₂ , and P ₃ indicate the score of the similarity among the keywords that satisfy the certain criteria for the similarity to the selected side effect among the side effects keyword group of the selected class of drugs. It is a variable for memorizing.

なお、実施形態によっては、点数としては使われない特定の値（例えば−１など）を、変数Ｐ_１、Ｐ_２、Ｐ_３の初期値として用いることもできる。また、３つの変数Ｐ_１、Ｐ_２、Ｐ_３はそれぞれ１位、２位、３位の点数に対応する。 In some embodiments, a specific value that is not used as a score (for example, −1) can be used as an initial value of the variables P ₁ , P ₂ , and P ₃ . The three variables P ₁ , P ₂ , and P ₃ correspond to the first, second, and third rank points, respectively.

そして、次のステップＳ２０２で副作用判定・学習部１０８は、選択類薬の副作用キーワード群を取得する。すなわち、副作用判定・学習部１０８は、学習結果テーブル２０４においてＩＤが選択類薬ＩＤと一致するエントリを検索し、見つかったエントリの副作用キーワード群を取得する。そして、処理はステップＳ２０３に移行する。 Then, in the next step S202, the side effect determination / learning unit 108 acquires a side effect keyword group of the selected drug. That is, the side effect determination / learning unit 108 searches the learning result table 204 for an entry whose ID matches the selected analogy drug ID, and acquires a side effect keyword group of the found entry. Then, the process proceeds to step S203.

ステップＳ２０３で副作用判定・学習部１０８は、ステップＳ２０２で取得した副作用キーワード群の中で、ステップＳ２０４以降の処理を行っていない未処理のものが残っているか否かを判断する。未処理のキーワードが残っていれば、処理はステップＳ２０４に移行し、すべてのキーワードが処理済みならば、処理はステップＳ２１０に移行する。 In step S203, the side effect determination / learning unit 108 determines whether or not an unprocessed one that has not been processed in step S204 and subsequent steps remains in the side effect keyword group acquired in step S202. If unprocessed keywords remain, the process proceeds to step S204. If all keywords have been processed, the process proceeds to step S210.

ステップＳ２０４で副作用判定・学習部１０８は、ステップＳ２０２で取得した副作用キーワード群の中で、ステップＳ２０４以降の処理を行っていない未処理のキーワードのうちの任意の１つを選択する。以下、ステップＳ２０４で選択されたキーワードを「選択キーワード」という。 In step S204, the side effect determination / learning unit 108 selects any one of unprocessed keywords that have not been processed in step S204 and subsequent steps from the side effect keyword group acquired in step S202. Hereinafter, the keyword selected in step S204 is referred to as “selected keyword”.

そして、次のステップＳ２０５で副作用判定・学習部１０８は、選択副作用と選択キーワードの類似度をキーワード類似度評価部１０９に評価させ、評価結果の点数を得る。ステップＳ２０５の詳細は「点数計算処理」として図７〜９とともに後述する。 In the next step S205, the side effect determination / learning unit 108 causes the keyword similarity evaluation unit 109 to evaluate the similarity between the selected side effect and the selected keyword, and obtains a score of the evaluation result. Details of step S205 will be described later with reference to FIGS.

続いて、ステップＳ２０６で副作用判定・学習部１０８は、ステップＳ２０５で得た点数が、「選択副作用と選択キーワードは一致する」と見なしてよいことを示す所定の基準を満たすか否かを判断する。なお、当該基準については、点数計算処理の詳細とあわせて、図８を参照して後述する。 Subsequently, in step S206, the side effect determination / learning unit 108 determines whether or not the score obtained in step S205 satisfies a predetermined criterion indicating that “the selected side effect matches the selected keyword”. . The reference will be described later with reference to FIG. 8 together with the details of the score calculation process.

ステップＳ２０５で得た点数が基準を満たすとき、処理はステップＳ２０７に移行する。他方、ステップＳ２０５で得た点数が基準を満たさないとき、処理はステップＳ２０３に戻る。 When the score obtained in step S205 satisfies the criterion, the process proceeds to step S207. On the other hand, when the score obtained in step S205 does not satisfy the criterion, the process returns to step S203.

ステップＳ２０７が実行されるのは、上記のように、選択副作用と選択キーワードが一致すると見なしてよい場合である。そこで、ステップＳ２０７で副作用判定・学習部１０８は、選択副作用と選択キーワードを同義語として学習する。すなわち、副作用判定・学習部１０８は、選択副作用と選択キーワードを対にしたエントリを同義語辞書２０３に追加する。 Step S207 is executed when the selected side effect and the selected keyword may be considered to match as described above. In step S207, the side effect determination / learning unit 108 learns the selected side effect and the selected keyword as synonyms. That is, the side effect determination / learning unit 108 adds an entry in which the selected side effect and the selected keyword are paired to the synonym dictionary 203.

そして、次のステップＳ２０８で副作用判定・学習部１０８は、ステップＳ２０５で得た点数が、選択副作用と一致すると見なせるキーワードに関して今までに得られた点数の中で上位３位以内に入る点数か否かを判断する。 Then, in the next step S208, the side effect determination / learning unit 108 determines whether the score obtained in step S205 falls within the top three points among the scores obtained so far for keywords that can be considered to match the selected side effect. Determine whether.

具体的には、副作用判定・学習部１０８は、変数Ｐ_１〜Ｐ_３のうち１つでも初期状態のＮＵＬＬのままのものがあれば、「ステップＳ２０５で得た点数は上位３位以内」と判断する。また、変数Ｐ_１〜Ｐ_３にすべて具体的な値が設定済みの場合、副作用判定・学習部１０８は、変数Ｐ_３の値（つまり３位の点数）よりステップＳ２０５で得た点数が大きければ、「ステップＳ２０５で得た点数は上位３位以内」と判断する。 Specifically, if any one of the variables P _{1 to} P ₃ remains NULL in the initial state, the side effect determination / learning unit 108 determines that “the score obtained in step S205 is within the top three”. to decide. If specific values have already been set for the variables P _{1 to} P ₃ , the side effect determination / learning unit 108 determines that the score obtained in step S205 is larger than the value of the variable P ₃ (that is, the third-ranked score). , “The score obtained in step S205 is within the top three” is determined.

逆に、変数Ｐ_１〜Ｐ_３にすべて具体的な値が設定済みで、かつステップＳ２０５で得た点数が変数Ｐ_３の値以下であれば、副作用判定・学習部１０８は、「ステップＳ２０５で得た点数は上位３位以内ではない」と判断する。 Conversely, if specific values have already been set for the variables P _{1 to} P ₃ and the score obtained in step S205 is equal to or smaller than the value of the variable P ₃ , the side effect determination / learning unit 108 determines that “in step S205 The score obtained is not within the top 3 ”.

そして、副作用判定・学習部１０８が「ステップＳ２０５で得た点数は上位３位以内」と判断した場合、処理はステップＳ２０９に移行する。それ以外の場合、処理はステップＳ２０３に戻る。 If the side effect determination / learning unit 108 determines that “the score obtained in step S205 is within the top three”, the process proceeds to step S209. Otherwise, the process returns to step S203.

ステップＳ２０９で副作用判定・学習部１０８は、ステップＳ２０５で得た点数に応じて、適宜変数Ｐ_１〜Ｐ_３を更新する。
具体的には、変数Ｐ_１がＮＵＬＬの場合、副作用判定・学習部１０８は、ステップＳ２０５で得た点数を変数Ｐ_１に代入する。また、変数Ｐ_１がＮＵＬＬではなく、変数Ｐ_２がＮＵＬＬの場合、副作用判定・学習部１０８は、ステップＳ２０５で得た点数を変数Ｐ_２に代入する。そして、変数Ｐ_１とＰ₂がＮＵＬＬではなく、変数Ｐ_３がＮＵＬＬの場合、副作用判定・学習部１０８は、ステップＳ２０５で得た点数を変数Ｐ_３に代入する。 In step S209, the side effect determination / learning unit 108 appropriately updates the variables P _{1 to} P ₃ in accordance with the score obtained in step S205.
Specifically, when the variable P ₁ is NULL, the side effect determination / learning unit 108 substitutes the score obtained in step S 205 for the variable P ₁ . When the variable P ₁ is not NULL and the variable P ₂ is NULL, the side effect determination / learning unit 108 substitutes the score obtained in step S205 for the variable P ₂ . When the variables P ₁ and P ₂ are not NULL and the variable P ₃ is NULL, the side effect determination / learning unit 108 substitutes the score obtained in step S205 for the variable P ₃ .

他方、変数Ｐ_１〜Ｐ_３のすべてに具体的な値が設定されている場合、副作用判定・学習部１０８は次のように変数の更新を行う。
すなわち、ステップＳ２０５で得た点数が変数Ｐ_１の値より大きい場合、副作用判定・学習部１０８は、変数Ｐ_３に現在の変数Ｐ_２の値を代入し、変数Ｐ_２に現在の変数Ｐ_１の値を代入し、変数Ｐ_１にステップＳ２０５で得た点数を代入する。あるいは、ステップＳ２０５で得た点数が変数Ｐ_１の値以下で、かつ変数Ｐ_２の値より大きい場合、副作用判定・学習部１０８は、変数Ｐ_３に現在の変数Ｐ_２の値を代入し、変数Ｐ_２にステップＳ２０５で得た点数を代入する。あるいは、ステップＳ２０５で得た点数が変数Ｐ_２の値以下で、かつ変数Ｐ_３の値より大きい場合、副作用判定・学習部１０８は、変数Ｐ_３にステップＳ２０５で得た点数を代入する。 On the other hand, when specific values are set for all of the variables P _{1 to} P ₃ , the side effect determination / learning unit 108 updates the variables as follows.
That is, if the number obtained in step S205 is greater than the value of the variable _{P 1,} side effects judgment and learning unit 108 substitutes the current value of the variable _{P 2} into the variable _{P 3,} the current in the variable _{P 2} variable _{P 1} substituting values, it substitutes the number obtained in step S205 to the variable P _1. Alternatively, a number of points is equal to or less than the value of the variable _{P 1} in step S205, and if the value is greater than the variable _{P 2,} side effects judgment and learning unit 108 substitutes the current value of the variable _{P 2} into the variable _{P 3,} substituting score obtained in step S205 to the variable _{P 2.} Alternatively, a number of points is equal to or less than the value of the variable _{P 2} in step S205, and if greater than the value variable _{P 3,} side effects judgment and learning unit 108 assigns a score obtained in the variable _{P 3} at step S205.

以上のようにして変数Ｐ_１〜Ｐ_３の更新が終了すると、処理はステップＳ２０３に戻る。
また、ステップＳ２１０で副作用判定・学習部１０８は、変数Ｐ_１とＰ_２とＰ_３を引数として用いて、図１０とともに後述する点数正規化処理を行い、正規化した点数を算出する。そして、ステップＳ２１１で副作用判定・学習部１０８は、正規化した点数を図６の処理の戻り値として返し、図６の処理は終了する。つまり、図６の処理に相当する図３のステップＳ１１２において、副作用判定・学習部１０８は、上記の正規化した点数を類似度として取得する。 When the updating of the variables P _{1 to} P ₃ is completed as described above, the process returns to step S203.
Furthermore, side effects judgment and learning unit 108 in step S210, by using the variable P ₁ and P ₂ and P ₃ as arguments, perform score normalization process to be described later in conjunction with FIG. 10, and calculates a score normalized. In step S211, the side effect determination / learning unit 108 returns the normalized score as a return value of the process in FIG. 6, and the process in FIG. 6 ends. That is, in step S112 of FIG. 3 corresponding to the process of FIG. 6, the side effect determination / learning unit 108 acquires the normalized score as a similarity.

図７は、点数計算処理のフローチャートである。図６のステップＳ２０５に関して述べたように、点数計算処理はキーワード類似度評価部１０９が２つのキーワードの類似度を示す点数を計算する処理である。以下、図７の説明においては、キーワード類似度評価部１０９に指定される２つのキーワードを「キーワードＡ」及び「キーワードＢ」という。 FIG. 7 is a flowchart of the score calculation process. As described with reference to step S205 in FIG. 6, the score calculation process is a process in which the keyword similarity evaluation unit 109 calculates a score indicating the similarity between two keywords. Hereinafter, in the description of FIG. 7, the two keywords designated by the keyword similarity evaluation unit 109 are referred to as “keyword A” and “keyword B”.

ステップＳ３０１でキーワード類似度評価部１０９は、キーワードＡとＢの類似度を示す変数ｍａｘに０という初期値を代入する。
続くステップＳ３０２でキーワード類似度評価部１０９は、キーワードＡを１つ以上の部分文字列に分割する分割パターンについて、何番目の分割パターンかを数えるための変数ａを１に初期化する。例えば、キーワードＡが「脳出血」であるとし、分割箇所を「／」で示すことにすると、「脳出血」、「脳／出血」、「脳出／血」、「脳／出／血」という４通りの分割パターンが可能であり、変数ａは１から４まで順に数えるための変数である。 In step S301, the keyword similarity evaluation unit 109 assigns an initial value of 0 to a variable max indicating the similarity between the keywords A and B.
In subsequent step S302, the keyword similarity evaluation unit 109 initializes a variable a for counting the division pattern for dividing the keyword A into one or more partial character strings to 1. For example, if the keyword A is “cerebral hemorrhage” and the division part is indicated by “/”, “cerebral hemorrhage”, “brain / bleeding”, “brain / blood”, “brain / out / blood” 4 Various division patterns are possible, and the variable a is a variable for counting from 1 to 4 in order.

そして、次のステップＳ３０３でキーワード類似度評価部１０９は、キーワードＡのａ番目の分割パターンＱ_ａを生成する。
続いて、ステップＳ３０４でキーワード類似度評価部１０９は、キーワードＢを１つ以上の部分文字列に分割する分割パターンについて、何番目の分割パターンかを数えるための変数ｂを１に初期化する。 Then, the keyword similarity degree evaluation unit 109 at the next step S303 generates a second division pattern _{Q a} keyword A.
Subsequently, in step S304, the keyword similarity evaluation unit 109 initializes a variable b for counting the division pattern for dividing the keyword B into one or more partial character strings to 1.

そして、次のステップＳ３０５でキーワード類似度評価部１０９は、キーワードＢのｂ番目の分割パターンＱ_ｂを生成する。
また、次のステップＳ３０６でキーワード類似度評価部１０９は、分割パターンＱ_ａとＱ_ｂを用いたときのキーワードＡとＢの類似度を示す変数ｓｃｏｒｅを０に初期化する。 Then, the keyword similarity degree evaluation unit 109 at the next step S305 generates a b-th division pattern _{Q b} keyword B.
Also, the keyword similarity degree evaluation unit 109 at the next step S306 initializes to zero a variable score for keyword indicating the similarity of A and B when using division pattern _{Q a} and _{Q b.}

さらに、次のステップＳ３０７でキーワード類似度評価部１０９は、分割パターンＱ_ａ内で注目する部分文字列が何番目のものかを数えるための変数ｊを１に初期化する。
そして、ステップＳ３０８でキーワード類似度評価部１０９内の部分文字列類似度評価部１１０は、分割パターンＱ_ａ内のｊ番目の部分文字列を取得し、変数ｓｕｂｓｔｒに代入する。例えば、分割パターンＱ_ａが「脳／出血」の場合、分割パターンＱ_ａにおける１番目の部分文字列は「脳」であり、２番目の部分文字列は「出血」である。よって、例えばｊ＝１の場合は、変数ｓｕｂｓｔｒは「脳」という部分文字列を示す。 Further, the keyword similarity degree evaluation unit 109 at the next step S307 initializes to 1 a variable j for sub-string of interest in the division pattern Q _a counts what number of things.
The partial string similarity evaluation unit 110 of the keyword similarity within the evaluation unit 109 in step S308 obtains the j th substring in the divided pattern _{Q a,} into a variable substr. For example, when the division pattern Q _a is “brain / bleeding”, the first partial character string in the division pattern Q _a is “brain” and the second partial character string is “bleeding”. Therefore, for example, when j = 1, the variable substr indicates a partial character string “brain”.

続いて、ステップＳ３０９で部分文字列類似度評価部１１０は、分割パターンＱ_ｂの中に、部分文字列ｓｕｂｓｔｒと完全一致するか同義語として一致する部分文字列があるか否かを判断する。 Subsequently, the partial character string similarity evaluation unit 110 at step S309, the in division pattern Q _b, it is determined whether there is a substring that matches as synonyms or exact match and the partial string substr.

すなわち、部分文字列類似度評価部１１０は、分割パターンＱ_ｂの中に、部分文字列ｓｕｂｓｔｒと完全一致する部分文字列があるか否かを調べる。そして、もし部分文字列ｓｕｂｓｔｒと完全一致する部分文字列が見つかれば、処理はステップＳ３１０に移行する。 That is, the partial character string similarity evaluation unit 110, in division pattern Q _b, investigate whether there is a substring substr exactly match subexpression. If a partial character string that completely matches the partial character string substr is found, the process proceeds to step S310.

また、分割パターンＱ_ｂの中に、部分文字列ｓｕｂｓｔｒと完全一致する部分文字列がなかった場合、部分文字列類似度評価部１１０は、同義語辞書２０３を参照し、部分文字列ｓｕｂｓｔｒの同義語が登録されているか否かを調べる。そして、もし、部分文字列ｓｕｂｓｔｒの同義語が登録されていれば、部分文字列類似度評価部１１０は、分割パターンＱ_ｂの中に、部分文字列ｓｕｂｓｔｒの同義語と完全一致する部分文字列があるか否かを調べる。その結果、もし分割パターンＱ_ｂの中に、部分文字列ｓｕｂｓｔｒの同義語と完全一致する部分文字列が見つかれば、処理はステップＳ３１０に移行する。 Also, in division pattern _{Q b,} if there is no substring substr exactly match subexpression, substring similarity evaluation unit 110 refers to the synonym dictionary 203, synonymous substring substr Check if the word is registered. And if part if it is a synonym for the string substr registered partial string similarity evaluation unit 110, in division pattern Q _b, the partial character string that completely matches synonymous substring substr Find out if there is. As a result, if in the division pattern Q _b, if found substring that perfectly matches synonymous substring substr, the process proceeds to step S310.

他方、部分文字列ｓｕｂｓｔｒの同義語が同義語辞書２０３に登録されていない場合、処理はステップＳ３１１に移行する。また、部分文字列ｓｕｂｓｔｒの同義語が同義語辞書２０３に登録されているが、当該同義語と完全一致する部分文字列が分割パターンＱ_ｂの中には見つからなかった場合にも、処理はステップＳ３１１に移行する。 On the other hand, when the synonym of the partial character string substr is not registered in the synonym dictionary 203, the process proceeds to step S311. Although synonym substring substr is registered in the synonym dictionary 203, even when the partial string that exactly match the corresponding synonym is not found in the division pattern Q _b, the process steps The process proceeds to S311.

ステップＳ３１０で部分文字列類似度評価部１１０は、部分文字列ｓｕｂｓｔｒの長さ（以下、｜ｓｕｂｓｔｒ｜と表記する）に応じた配点ｆ（｜ｓｕｂｓｔｒ｜）を求める。すなわち、部分文字列類似度評価部１１０は、部分文字列ｓｕｂｓｔｒの一致に対して、配点ｆ（｜ｓｕｂｓｔｒ｜）という評価を与える。 In step S310, the partial character string similarity evaluation unit 110 obtains a score f (| substr |) corresponding to the length of the partial character string substr (hereinafter, expressed as | substr |). That is, the partial character string similarity evaluation unit 110 gives an evaluation of a scoring point f (| substr |) for matching of the partial character strings substr.

そして、キーワード類似度評価部１０９は、変数ｓｃｏｒｅに、部分文字列類似度評価部１１０が求めた配点ｆ（｜ｓｕｂｓｔｒ｜）を足す。つまり、キーワード類似度評価部１０９は、部分文字列類似度評価部１１０による評価を集計する。 Then, the keyword similarity evaluation unit 109 adds the score f (| substr |) obtained by the partial character string similarity evaluation unit 110 to the variable score. That is, the keyword similarity evaluation unit 109 totals the evaluations by the partial character string similarity evaluation unit 110.

なお、本実施形態では、部分文字列の長さがバイト数で数えられるが、実施形態によっては、部分文字列の長さは文字数で数えられてもよい。
ここで、ステップＳ３１０における配点について説明するために図８を参照する。図８は、判定装置１００において処理に利用される定数値を説明する図である。 In this embodiment, the length of the partial character string is counted by the number of bytes. However, depending on the embodiment, the length of the partial character string may be counted by the number of characters.
Here, FIG. 8 will be referred to in order to explain the points allocated in step S310. FIG. 8 is a diagram for explaining constant values used for processing in the determination apparatus 100.

説明の便宜上、図８には配点情報５０１と基準値情報５０２をテーブル形式で例示しているが、配点情報５０１と基準値情報５０２は、ＣＰＵ３０１が実行するプログラムにおいて定数として定義されていてもよい。あるいは、配点情報５０１と基準値情報５０２は、記憶装置３０７上のファイルにより定義され、ＣＰＵ３０１に読み出されてもよい。 For convenience of explanation, the scoring information 501 and the reference value information 502 are illustrated in a table format in FIG. 8, but the scoring information 501 and the reference value information 502 may be defined as constants in the program executed by the CPU 301. . Alternatively, the scoring information 501 and the reference value information 502 may be defined by a file on the storage device 307 and read by the CPU 301.

図８の配点情報５０１によれば、１〜１０バイトの長さに対して、それぞれ、１点、２点、３点、４点、５点、７点、８点、１０点、１１点、１２点という配点が定義されている。１１バイト以上の長さに対する配点は、図８では省略されている。 According to the scoring information 501 in FIG. 8, for a length of 1 to 10 bytes, 1 point, 2 points, 3 points, 4 points, 5 points, 7 points, 8 points, 10 points, 11 points, A score of 12 points is defined. Scoring for a length of 11 bytes or more is omitted in FIG.

ここで、上記のように、文字列ｓの長さを｜ｓ｜と表記し、配点情報５０１において長さ｜ｓ｜に割り当てられている点数をｆ（｜ｓ｜）と表記することにする。この表記を用いて説明すると、本実施形態の配点情報５０１は、任意の文字列ｓ_１とｓ_２に対して以下の式（１）を満たすように定義されている。
ｆ（｜ｓ_１｜）＋ｆ（｜ｓ_２｜）≦ｆ（｜ｓ_１｜＋｜ｓ_２｜）（１） Here, as described above, the length of the character string s is expressed as | s |, and the number of points assigned to the length | s | in the scoring information 501 is expressed as f (| s |). . To explain with reference to this notation, Scoring information 501 of the present embodiment is defined to satisfy equation (1) follows for any string s ₁ and s _2.
f (| s ₁ |) + f (| s ₂ |) ≦ f (| s ₁ | + | s ₂ |) (1)

例えば、配点情報５０１によれば、１バイトと７バイトの文字列の配点はそれぞれ１点と８点であり、８（＝１＋７）バイトの文字列の配点は１０点である。そして、１０点は１点と８点の和以上である。よって、｜ｓ_１｜＝１で｜ｓ_２｜＝７の場合、式（１）が満たされている。他の場合も、図８の配点情報５０１によれば、式（１）が満たされている。 For example, according to the scoring information 501, the scoring points for the 1-byte and 7-byte character strings are 1 point and 8 points, respectively, and the scoring points for the 8 (= 1 + 7) -byte character string are 10 points. And 10 points is more than the sum of 1 point and 8 points. Therefore, when | s ₁ | = 1 and | s ₂ | = 7, Expression (1) is satisfied. In other cases, according to the stipulation information 501 in FIG. 8, the expression (1) is satisfied.

式（１）を満たす配点情報５０１によれば、部分文字列類似度評価部１１０は、次のように部分文字列を評価することになる。つまり、部分文字列類似度評価部１１０は、第１と第２の長さを足した第３の長さの部分文字列同士が一致する場合には、第１の長さの部分文字列同士が一致する場合の評価と第２の長さの部分文字列同士が一致する場合の評価を足した評価以上の高い評価を与える。 According to the scoring information 501 that satisfies Expression (1), the partial character string similarity evaluation unit 110 evaluates the partial character string as follows. That is, the partial character string similarity evaluation unit 110 determines that the partial character strings having the first length are equal to each other when the partial character strings having the third length obtained by adding the first and second lengths match each other. A higher evaluation than the evaluation obtained by adding the evaluation when the character strings match and the evaluation when the partial character strings of the second length match is given.

なお、式（１）の意義と、図８の基準値情報５０２については、図７の点数計算処理について説明し終わってから改めて説明することにし、ここでは図７の説明に戻る。
ステップＳ３１１でキーワード類似度評価部１０９は、分割パターンＱ_ａの末尾まで調べ終わったか否かを判断する。つまり、キーワード類似度評価部１０９は、分割パターンＱ_ａ内の最後の部分文字列についてステップＳ３０９の判定をし終えたか否かを判断する。 Note that the significance of equation (1) and the reference value information 502 in FIG. 8 will be described again after the description of the score calculation processing in FIG. 7, and here, the description returns to FIG. 7.
Keyword similarity evaluation unit 109 in step S311 determines whether or not finished examined until the end of the division pattern Q _a. That is, the keyword similarity evaluation unit 109 for the last substring in the divided pattern Q _a is determined whether finished the determination in step S309.

そして、まだ分割パターンＱ_ａの末尾まで調べ終わっていなければ、処理はステップＳ３１２に移行する。他方、分割パターンＱ_ａの末尾まで調べ終わっていれば、処理はステップＳ３１３に移行する。 And, if not finished examining still until the end of the division pattern Q _a, the process proceeds to step S312. On the other hand, if finished it examined until the end of the divided pattern Q _a, the process proceeds to step S313.

ステップＳ３１２でキーワード類似度評価部１０９は、分割パターンＱ_ａ内で次の部分文字列に注目するため、変数ｊの値を１増やす。そして、処理はステップＳ３０８に戻る。 Step keyword similarity evaluation unit 109 in S312, in order to focus within the division pattern Q _a in the next substring, the value of the variable j is incremented by one. Then, the process returns to step S308.

また、ステップＳ３１３でキーワード類似度評価部１０９は、変数ｓｃｏｒｅの値が変数ｍａｘの値を超えているか否かを判断する。
変数ｓｃｏｒｅの値が変数ｍａｘの値を超えている場合、現在の分割パターンＱ_ａとＱ_ｂの組み合わせにしたがって計算した類似度は、今までのどの分割パターンの組み合わせにしたがって計算した類似度よりも高い。よって、この場合、現在の分割パターンＱ_ａとＱ_ｂの組み合わせから得られた最高の類似度を記憶するため、処理はステップＳ３１４に移行する。他方、変数ｓｃｏｒｅの値が変数ｍａｘの値以下であれば、処理はステップＳ３１５に移行する。 In step S313, the keyword similarity evaluation unit 109 determines whether the value of the variable score exceeds the value of the variable max.
If the value of the variable score is greater than the value of the variable max, the degree of similarity calculated according to the combination of the current division pattern Q _a and Q _b, than the similarity calculated in accordance with a combination of any division pattern ever high. Therefore, in this case, to store the highest similarity obtained from the combination of the current division pattern Q _a and Q _b, the process proceeds to step S314. On the other hand, if the value of the variable score is less than or equal to the value of the variable max, the process proceeds to step S315.

ステップＳ３１４でキーワード類似度評価部１０９は、変数ｍａｘに変数ｓｃｏｒｅの値を代入する。そして、処理はステップＳ３１５に移行する。
ステップＳ３１５でキーワード類似度評価部１０９は、変数ｂの値を１増やす。そして、処理はステップＳ３１６に移行する。 In step S314, the keyword similarity evaluation unit 109 substitutes the value of the variable score for the variable max. Then, the process proceeds to step S315.
In step S315, the keyword similarity evaluation unit 109 increases the value of the variable b by one. Then, the process proceeds to step S316.

ステップＳ３１６でキーワード類似度評価部１０９は、キーワードＡについて現在注目している分割パターンＱ_ａに対して、キーワードＢのすべての分割パターンによる類似度の計算を終えたか否かを判断する。具体的には、ある文字列ｓの文字数をｃ（ｓ）と表記することにすると、キーワードＢの分割パターンは、２^{ｃ（Ｂ）−１}通りあるので、キーワード類似度評価部１０９は、変数ｂの値が２^{ｃ（Ｂ）−１}を超えたか否かを判断する。 In step S < _b > 316, the keyword similarity evaluation unit 109 determines whether the similarity calculation for all the divided patterns of the keyword B has been completed for the divided pattern Q _a currently focused on the keyword A. Specifically, if the number of characters of a certain character string s is expressed as c (s), there are 2 ^{c (B) -1} division patterns of the keyword B, so the keyword similarity evaluation unit 109 It is determined whether or not the value of b exceeds ^{2c (B) -1} .

もし、変数ｂの値が２^{ｃ（Ｂ）−１}を超えていれば、キーワードＡについての次の分割パターンを検討するため、処理はステップＳ３１７に移行する。逆に、変数ｂの値が２^{ｃ（Ｂ）−１}以下であれば、キーワードＡについて現在注目している分割パターンＱ_ａに対して、キーワードＢの次の分割パターンによる類似度の計算を行うため、処理はステップＳ３０５に戻る。 If the value of the variable b exceeds ^{2c (B) -1} , the process proceeds to step S317 in order to examine the next division pattern for the keyword A. On the other hand, if the value of the variable b is 2 ^{c (B) -1} or less, the degree of similarity is calculated for the divided pattern Q _a currently focused on the keyword A by the next divided pattern of the keyword B. Therefore, the process returns to step S305.

ステップＳ３１７でキーワード類似度評価部１０９は、変数ａの値を１増やす。そして、処理はステップＳ３１８に移行する。
ステップＳ３１８でキーワード類似度評価部１０９は、キーワードＡのすべての分割パターンによる類似度の計算を終えたか否かを判断する。具体的には、キーワードＡの分割パターンは、２^{ｃ（Ａ）−１}通りあるので、キーワード類似度評価部１０９は、変数ａの値が２^{ｃ（Ａ）−１}を超えたか否かを判断する。 In step S317, the keyword similarity evaluation unit 109 increases the value of the variable a by 1. Then, the process proceeds to step S318.
In step S 318, the keyword similarity evaluation unit 109 determines whether the similarity calculation for all the divided patterns of the keyword A has been completed. Specifically, since there are ^{2c (A) -1} division patterns of keyword A, the keyword similarity evaluation unit 109 determines whether the value of the variable a exceeds ^{2c (A) -1.} To do.

もし、変数ａの値が２^{ｃ（Ａ）−１}を超えていれば、処理はステップＳ３１９に移行する。逆に、変数ａの値が２^{ｃ（Ａ）−１}以下であれば、キーワードＡの次の分割パターンによる類似度の計算を行うため、処理はステップＳ３０３に戻る。 If the value of the variable a exceeds ^{2c (A) -1} , the process proceeds to step S319. On the other hand, if the value of the variable a is 2 ^{c (A) −1} or less, the process returns to step S303 in order to calculate the similarity according to the next division pattern of the keyword A.

ステップＳ３１９でキーワード類似度評価部１０９は、変数ｍａｘの値を、キーワードＡとＢの類似度を示す点数として返す。
なお、図７に関する上記の説明においては、キーワードＡに関して２^{ｃ（Ａ）−１}通りある分割パターンのうちどれを何番目と数えるかについては限定していない。すなわち、キーワード類似度評価部１０９は２^{ｃ（Ａ）−１}通りの分割パターンを任意の順序にしたがって順序づけることができる。キーワードＢに関する２^{ｃ（Ｂ）−１}通りの分割パターンについても同様に、キーワード類似度評価部１０９は任意の順序にしたがって順序づけることができる。 In step S319, the keyword similarity evaluation unit 109 returns the value of the variable max as a score indicating the similarity between the keywords A and B.
In the above description regarding FIG. 7, it is not limited which number of ^{2c (A) −1} divided patterns for keyword A is counted. That is, the keyword similarity evaluation unit 109 can order ^{2c (A) -1} divided patterns according to an arbitrary order. Similarly, the ^{2c (B) -1} divided patterns related to the keyword B can be ordered according to an arbitrary order by the keyword similarity evaluation unit 109.

以上のようにして図７の点数計算処理においてキーワードＡとＢから計算される点数は、キーワードＡとＢの分割パターンの任意の組み合わせから得られる点数のうちで最高のものである。そして、図７の処理により得られる点数は、上記式（１）を満たすように配点情報５０１で定義された配点の累積加算により得られる。 As described above, the score calculated from the keywords A and B in the score calculation process of FIG. 7 is the highest score obtained from an arbitrary combination of the divided patterns of the keywords A and B. And the score obtained by the process of FIG. 7 is obtained by the cumulative addition of the score defined by the score information 501 so that the said Formula (1) may be satisfy | filled.

以上から、「キーワード類似度評価部１０９による評価は、キーワードＡとＢが完全に一致しなくても、短い部分文字列で一致しているものがあれば、それなりの点数を与えるものである」と言うことができる。同時に、キーワード類似度評価部１０９による評価は、キーワードＡとＢの間で一致する部分文字列が長いほど高く評価するものでもある。 From the above, “the evaluation by the keyword similarity evaluation unit 109 gives a reasonable score if keywords A and B do not completely match but there is a match in a short partial character string”. Can be said. At the same time, the evaluation by the keyword similarity evaluation unit 109 also evaluates higher as the partial character string that matches between the keywords A and B becomes longer.

そこで、キーワード類似度評価部１０９による評価についてより詳しく説明するために、再度上記の式（１）を取り上げる。
式（１）の左辺は、ある部分文字列ｓ_１が２つのキーワード間で一致し、かつある部分文字列ｓ_２も２つのキーワード間で一致する場合に、部分文字列ｓ_１とｓ_２それぞれの一致に起因して加算される点数を示す。また、式（１）の右辺は、部分文字列ｓ_１とｓ_２を連結した文字列（便宜上「ｓ_１・ｓ_２」と表記する）が２つのキーワード間で一致する場合に、文字列（ｓ_１・ｓ_２）の一致に起因して加算される点数を示す。したがって、式（１）は、部分文字列ｓ_１とｓ_２が離れていてそれぞれキーワード間で一致する場合の点数以上の点数を、部分文字列ｓ_１とｓ_２を連結した文字列（ｓ_１・ｓ_２）全体がキーワード間で一致する場合に付与することを示している。 Therefore, in order to explain the evaluation by the keyword similarity evaluation unit 109 in more detail, the above formula (1) is taken up again.
The left side of the expression (1) indicates that when a partial character string s ₁ matches between two keywords, and a partial character string s ₂ also matches between two keywords, the partial character strings s ₁ and s ₂ respectively Indicates the number of points added due to the match. Further, the right side of the expression (1) is a character string (when the character string obtained by concatenating the partial character strings s ₁ and s ₂ (referred to as “s ₁ · s ₂ ” for convenience) matches between two keywords ( The number of points added due to the coincidence of s ₁ · s ₂ is shown. Thus, equation (1), the partial strings s ₁ and s ₂ is the score or scores in the case of matching between each away keyword substrings s ₁ and s ₂ the connecting string (s ₁ S ₂ ) Indicates that the keyword is assigned when the entire keyword matches.

また、以下の式（２）が成立する場合、式（１）より式（３）が成立するので、式（３）の両辺にｆ（｜ｓ_１｜）を足して式（４）が得られる。そして、式（４）の右辺は式（１）よりｆ（｜ｓ_１｜＋｜ｓ_２｜）以下なので、式（２）から式（５）が得られる。
｜ｓ_２｜＝｜ｓ_３｜＋｜ｓ_４｜（２）
ｆ（｜ｓ_３｜）＋ｆ（｜ｓ_４｜）≦ｆ（｜ｓ_２｜）（３）
ｆ（｜ｓ_１｜）＋ｆ（｜ｓ_３｜）＋ｆ（｜ｓ_４｜）
≦ｆ（｜ｓ_１｜）＋ｆ（｜ｓ_２｜）（４）
ｆ（｜ｓ_１｜）＋ｆ（｜ｓ_３｜）＋ｆ（｜ｓ_４｜）
≦ｆ（｜ｓ_１｜＋｜ｓ_３｜＋｜ｓ_４｜）（５）
式（５）は、３つの離れた箇所で部分文字列ｓ_１とｓ_３とｓ_４がそれぞれキーワード間で一致する場合の点数以上の点数を、部分文字列ｓ_１とｓ_３とｓ_４が連続した文字列（ｓ_１・ｓ_３・ｓ_４）全体がキーワード間で一致する場合に付与することを示している。 Further, when the following expression (2) is satisfied, expression (3) is satisfied from expression (1). Therefore, f (| s ₁ |) is added to both sides of expression (3) to obtain expression (4). It is done. Since the right side of Expression (4) is less than f (| s ₁ | + | s ₂ |) from Expression (1), Expression (5) is obtained from Expression (2).
| S ₂ | = | s ₃ | + | s ₄ | (2)
f (| s ₃ |) + f (| s ₄ |) ≦ f (| s ₂ |) (3)
f (| s ₁ |) + f (| s ₃ |) + f (| s ₄ |)
≦ f (| s ₁ |) + f (| s ₂ |) (4)
f (| s ₁ |) + f (| s ₃ |) + f (| s ₄ |)
≦ f (| s ₁ | + | s ₃ | + | s ₄ |) (5)
The expression (5) indicates that the partial character strings s ₁ , s _3, and s ₄ are score points equal to or higher than the points when the partial character strings s ₁ , s _3, and s ₄ match between the keywords at three separate locations. This indicates that the entire continuous character string (s ₁ s ₃ s ₄ ) is assigned when the keywords match.

同様に式を導出することで、２以上の任意のＲに関して次のことが成り立つ。すなわち、配点情報５０１によれば、離れたＲ箇所でＲ個の部分文字列がそれぞれキーワード間で一致する場合の点数以上の点数が、Ｒ個の部分文字列を連結した文字列全体がキーワード間で一致する場合に付与される。 Similarly, by deriving the equation, the following holds for any two or more Rs. That is, according to the scoring information 501, a score that is equal to or higher than the score when the R partial character strings are matched between the keywords at the distant R locations is that the entire character string obtained by connecting the R partial character strings is between the keywords. It is given when they match.

したがって、式（１）を満たすように定義された配点情報５０１によれば、あるキーワードＡとＢの組み合わせに対して図７の点数計算処理により得られる可能性のある最高の点数は、次の（ｂ１）〜（ｂ３）のいずれかである。
（ｂ１）キーワードＡとＢの長さが等しい場合、配点情報５０１において長さ｜Ａ｜（すなわち長さ｜Ｂ｜）に対応づけられている配点。
（ｂ２）キーワードＡの方がキーワードＢより短い場合、配点情報５０１において長さ｜Ａ｜に対応づけられている配点。
（ｂ３）キーワードＢの方がキーワードＡより短い場合、配点情報５０１において長さ｜Ｂ｜に対応づけられている配点。 Therefore, according to the scoring information 501 defined so as to satisfy the formula (1), the highest score that can be obtained by the score calculation process of FIG. Any of (b1) to (b3).
(B1) When the lengths of the keywords A and B are the same, the score assigned to the length | A | (that is, the length | B |) in the score information 501.
(B2) When the keyword A is shorter than the keyword B, the score assigned to the length | A | in the score information 501.
(B3) When the keyword B is shorter than the keyword A, the score associated with the length | B | in the score information 501.

そこで、あるキーワードＡとＢに対して図７の点数計算処理によって計算された点数が、「キーワードＡとＢが一致する」と見なしてよい基準を満たすか否かは、キーワードＡ、Ｂ又はその双方の長さに応じて判断することが妥当である。そこで、図８に示すように、本実施形態の判定装置１００では、長さ（本実施形態ではバイト数）に応じて取りうる最高点数に１未満の定数βを乗じた値を、当該長さに対応する基準値として定義する基準値情報５０２が使われる。 Therefore, whether or not the score calculated by the score calculation process of FIG. 7 for a certain keyword A and B satisfies a criterion that can be regarded as “keywords A and B match” is keyword A, B or its It is reasonable to judge according to both lengths. Therefore, as shown in FIG. 8, in the determination apparatus 100 according to the present embodiment, a value obtained by multiplying the maximum score that can be taken according to the length (the number of bytes in the present embodiment) by a constant β less than 1 is used as the length. Reference value information 502 defined as a reference value corresponding to is used.

具体的には、図６のステップＳ２０６では、選択副作用と選択キーワードが一致すると見なしてよい基準として、基準値情報５０２において選択副作用の長さに対応づけられている基準値が使われる。つまり、ステップＳ２０６で副作用判定・学習部１０８は、ステップＳ２０５でキーワード類似度評価部１０９により計算された点数が、基準値情報５０２において選択副作用の長さに対応づけられている基準値以上か否かを判断する。そして、上記点数が上記基準値以上であれば、副作用判定・学習部１０８は基準が満たされていると判断し、処理はステップＳ２０７へ移行する。 Specifically, in step S206 of FIG. 6, a reference value associated with the length of the selected side effect in the reference value information 502 is used as a reference that may be considered that the selected side effect matches the selected keyword. That is, in step S206, the side effect determination / learning unit 108 determines whether or not the score calculated by the keyword similarity evaluation unit 109 in step S205 is greater than or equal to the reference value associated with the length of the selected side effect in the reference value information 502. Determine whether. If the score is equal to or greater than the reference value, the side effect determination / learning unit 108 determines that the criterion is satisfied, and the process proceeds to step S207.

なお、図６のステップＳ２０６において、選択キーワードが選択副作用より短い場合、ステップＳ２０５で計算される点数は、図８の基準値情報５０２において選択副作用の長さに対応づけられている「取りうる最高点数」に満たないことは明らかである。しかし、図６の処理では、選択キーワードが選択副作用より短くても、上記のとおりステップＳ２０６では基準値情報５０２において選択副作用の長さに対応づけられている基準値が使われる。なぜなら、図６の処理では選択副作用の側に視点があり、図６の処理は、選択副作用と類似度の高い副作用が選択類薬の副作用キーワード群の中にあるか否かを判別することを目的としているからである。 When the selected keyword is shorter than the selected side effect in step S206 in FIG. 6, the score calculated in step S205 is “the highest possible value” associated with the length of the selected side effect in the reference value information 502 in FIG. It is clear that it is less than "score". However, in the process of FIG. 6, even if the selected keyword is shorter than the selected side effect, in step S206, the reference value associated with the length of the selected side effect is used in step S206 as described above. This is because the process shown in FIG. 6 has a viewpoint on the side of the selected side effect, and the process shown in FIG. 6 determines whether or not a side effect having a high similarity to the selected side effect is included in the side effect keyword group of the selected drug. Because it is aimed.

なお、図８の配点情報５０１と基準値情報５０２は、類薬処理部１０３によっても利用されるが、類薬処理部１０３による利用については図１１〜１４とともに後述する。
また、式（１）を満たす配点は、人間の知見に基づいて選ばれて設定されたものでもよい。例えば、本実施形態では、「長さ１〜５バイトに対しては、長さに重み１を乗じた値を配点とし、長さ６〜１０バイトに対しては、長さに重み１．２を乗じて小数点以下を切り捨てた値を配点とする」という方針にしたがって配点情報５０１が設定されている。 Note that the scoring information 501 and the reference value information 502 in FIG. 8 are also used by the similar medicine processing unit 103, and the usage by the similar medicine processing unit 103 will be described later with reference to FIGS.
Further, the score satisfying the formula (1) may be selected and set based on human knowledge. For example, in the present embodiment, “for lengths of 1 to 5 bytes, a value obtained by multiplying the length by weight 1 is used as a score, and for lengths of 6 to 10 bytes, the length is weighted 1.2. Scoring information 501 is set in accordance with the policy of “multiplying and rounding off the decimal point to be a score”.

あるいは、適宜の正の定数Ｃを用いて、任意の長さ｜ｓ｜に対して、例えば式（６）のように配点ｆ（｜ｓ｜）を定義することで、式（１）を成立させることもできる。
ｆ（｜ｓ｜）＝Ｃ｜ｓ｜^２（６）
式（６）は、単位長さあたりの配点を文字列長｜ｓ｜に比例する値（すなわちＣ｜ｓ｜）とすることで、長さ｜ｓ｜の文字列同士の一致にはＣ｜ｓ｜^２点を与えることを示している。なお、式（６）の例に限らず、一般に、単位長さあたりの配点が文字列長に対して単調増加するように定義された配点は、式（１）を満たす。 Alternatively, using an appropriate positive constant C, for example, by defining a score f (| s |) as shown in Equation (6) for an arbitrary length | s |, Equation (1) is satisfied. It can also be made.
f (| s |) = C | s | ² (6)
The expression (6) is obtained by setting the score per unit length to a value proportional to the character string length | s | (that is, C | s |). s | ² points are given. Note that, not limited to the example of Expression (6), generally, a score defined such that the score per unit length monotonously increases with respect to the character string length satisfies Expression (1).

続いて、上記の図７と８を参照して説明した点数計算処理についての理解を助けるため、図９を参照して具体例を挙げる。
図９は、点数計算処理の具体例を模式的に説明する図である。図７におけるキーワードＡが図９では「全身麻酔剤」というキーワード６０１に相当し、図７におけるキーワードＢが図９では「全身吸入麻酔剤」というキーワード６０２に相当する。 Next, a specific example will be given with reference to FIG. 9 in order to facilitate understanding of the score calculation processing described with reference to FIGS.
FIG. 9 is a diagram schematically illustrating a specific example of the score calculation process. The keyword A in FIG. 7 corresponds to the keyword 601 “general anesthetic” in FIG. 9, and the keyword B in FIG. 7 corresponds to the keyword 602 “general inhalation anesthetic” in FIG.

キーワード６０１は５文字なので、キーワード６０１の分割パターンは、１６（＝２^５−１）通りある。図９には例として、「全身／麻酔／剤」という分割パターン６０３ａと、「全身麻酔剤」という分割パターン６０３ｂと、「全身麻酔／剤」という分割パターン６０３ｃが図示してある。そして、他の１３通りの分割パターンについては図９では省略されている。 Since the keyword 601 is five characters, there are 16 (= 2 ^5-1 ) division patterns of the keyword 601. In FIG. 9, as an example, a division pattern 603a “general anesthesia / agent”, a division pattern 603b “general anesthetic”, and a division pattern 603c “general anesthesia / agent” are illustrated. The other 13 division patterns are omitted in FIG.

また、キーワード６０２は７文字なので、キーワード６０２の分割パターンは６４（＝２^７−１）通りある。図９には例として、「全身／吸入／麻酔／剤」という分割パターン６０４ａと、「全身吸入／麻酔／剤」という分割パターン６０４ｂと、「全身／吸入麻酔／剤」という分割パターン６０４ｃが図示してある。そして、他の６１通りの分割パターンについては図９では省略されている。 Since the keyword 602 has seven characters, the keyword 602 has 64 (= 2 ^7-1 ) division patterns. For example, FIG. 9 shows a division pattern 604a of “whole body / inhalation / anesthetic / agent”, a division pattern 604b of “whole body inhalation / anesthetic / agent”, and a division pattern 604c of “whole body / inhalation anesthesia / agent”. It is shown. The other 61 division patterns are omitted in FIG.

また、図９では、各分割パターンにおける各部分文字列を示す矩形の下に、図８の配点情報５０１において当該部分文字列の長さに対応づけられている配点を記して、説明の便宜を図ってある。 Further, in FIG. 9, below the rectangle indicating each partial character string in each divided pattern, the score corresponding to the length of the partial character string in the score information 501 in FIG. It is illustrated.

図８の点数計算処理によれば、キーワード６０１の１６通りの分割パターンと、キーワード６０２の６４通りの分割パターンのすべての組み合わせについて、点数の計算が行われ、最高の点数がキーワード６０１と６０２の類似度を示す点数として得られる。 According to the score calculation process of FIG. 8, the score is calculated for all combinations of the 16 division patterns of the keyword 601 and the 64 division patterns of the keyword 602, and the highest score is obtained from the keywords 601 and 602. It is obtained as a score indicating the degree of similarity.

例えば、分割パターン６０３ａと６０４ａの組み合わせに関しては、図７において、次のようにして点数が計算される。
すなわち、ｊ＝１のときのステップＳ３０９では、分割パターン６０３ａ内の１番目の部分文字列「全身」に一致する部分文字列が分割パターン６０４ａ内に見つかる。そこで、ステップＳ３１０では、図８の配点情報５０１において「全身」の長さ（つまり４バイト）に対応する点数４点が加算される。 For example, regarding the combination of the division patterns 603a and 604a, the score is calculated as follows in FIG.
That is, in step S309 when j = 1, a partial character string that matches the first partial character string “whole body” in the divided pattern 603a is found in the divided pattern 604a. Therefore, in step S310, 4 points corresponding to the length of the “whole body” (that is, 4 bytes) are added in the scoring information 501 of FIG.

また、ｊ＝２のときのステップＳ３０９では、分割パターン６０３ａ内の２番目の部分文字列「麻酔」に一致する部分文字列が分割パターン６０４ａ内に見つかる。そこで、ステップＳ３１０では、「麻酔」の長さ（つまり４バイト）に対応する点数４点が加算される。 In step S309 when j = 2, a partial character string that matches the second partial character string “anesthetic” in the divided pattern 603a is found in the divided pattern 604a. Therefore, in step S310, 4 points corresponding to the length of “anesthetic” (that is, 4 bytes) are added.

そして、ｊ＝３のときのステップＳ３０９では、分割パターン６０３ａ内の３番目の部分文字列「剤」に一致する部分文字列が分割パターン６０４ａ内に見つかる。そこで、ステップＳ３１０では、「剤」の長さ（つまり２バイト）に対応する２点が加算される。 In step S309 when j = 3, a partial character string that matches the third partial character string “agent” in the divided pattern 603a is found in the divided pattern 604a. In step S310, two points corresponding to the length of the “agent” (that is, 2 bytes) are added.

その結果、分割パターン６０３ａと６０４ａの組み合わせに対して得られる点数（すなわちステップＳ３１３の実行時における変数ｓｃｏｒｅの値）は、１０（＝４＋４＋２）点である。 As a result, the number of points obtained for the combination of the divided patterns 603a and 604a (that is, the value of the variable score when executing step S313) is 10 (= 4 + 4 + 2).

同様にして、分割パターン６０３ａと６０４ｂの組み合わせに対して得られる点数は、部分文字列「麻酔」と「剤」それぞれの一致に起因して加算された合計６（＝４＋２）点である。また、分割パターン６０３ａと６０４ｃの組み合わせに対して得られる点数は、部分文字列「全身」と「剤」それぞれの一致に起因して加算された合計６（＝４＋２）点である。 Similarly, the number of points obtained for the combination of the division patterns 603a and 604b is a total of 6 (= 4 + 2) points added due to the match between the partial character strings “anesthetic” and “agent”. Further, the number of points obtained for the combination of the divided patterns 603a and 604c is a total of 6 (= 4 + 2) points added due to the match between the partial character strings “whole body” and “agent”.

なお、図９のキーワード６０１と６０２の例の場合、図示を省略した「全身／麻酔剤」という分割パターンと、「全身／吸入／麻酔剤」という分割パターンの組み合わせから得られる１１（＝４＋７）点という点数が最高点である。よって、キーワード６０１と６０２に対してキーワード類似度評価部１０９が図７の点数計算処理の結果として返す点数は１１点である。 In the case of the keywords 601 and 602 in FIG. 9, 11 (= 4 + 7) obtained from a combination of the division pattern “whole body / anesthetic agent” and the division pattern “whole body / inhalation / anesthetic agent” not shown. The point is the highest score. Therefore, the score returned by the keyword similarity evaluation unit 109 as a result of the score calculation process of FIG. 7 for the keywords 601 and 602 is 11 points.

以上、図７〜９を参照して、図６のステップＳ２０５における点数計算処理について詳しく説明した。続いて、図６のステップＳ２１０における点数正規化処理の詳細を説明する。 The score calculation process in step S205 of FIG. 6 has been described in detail above with reference to FIGS. Next, details of the point normalization process in step S210 of FIG. 6 will be described.

図１０は、点数正規化処理のフローチャートである。
ステップＳ４０１で副作用判定・学習部１０８は、第１引数としてＰ_１の具体的な値が与えられているか、それとも第１引数がＮＵＬＬであるかを判断する。第１引数がＮＵＬＬであれば、処理はステップＳ４０２に移行する。他方、第１引数としてＰ_１の具体的な値が与えられていれば、処理はステップＳ４０３に移行する。 FIG. 10 is a flowchart of the score normalization process.
Side effects judgment and learning unit 108 in step S401, either specific value of P ₁ is given as the first argument, or is the first argument to determine whether it is NULL. If the first argument is NULL, the process proceeds to step S402. On the other hand, if it be a specific value of P ₁ is given as the first argument, the process proceeds to step S403.

ステップＳ４０２で副作用判定・学習部１０８は、０を返す。そして点数正規化処理は終了する。なお、ステップＳ４０２は、具体的に値が与えられた引数が１つもない場合のステップである。 In step S402, the side effect determination / learning unit 108 returns 0. Then, the score normalization process ends. Note that step S402 is a step when there is no argument that is specifically given a value.

例えば、図６の処理において、どの選択キーワードもステップＳ２０６で一致の基準を満たさないと判断された場合は、図１０のステップＳ４０２が実行されることになる。よって、正規化された点数は０点である。 For example, in the process of FIG. 6, if it is determined in step S206 that no selected keyword satisfies the matching criteria, step S402 of FIG. 10 is executed. Therefore, the normalized score is 0 points.

ステップＳ４０３で副作用判定・学習部１０８は、第２引数としてＰ_２の具体的な値が与えられているか、それとも第２引数がＮＵＬＬであるかを判断する。第２引数がＮＵＬＬであれば、処理はステップＳ４０４に移行する。他方、第２引数としてＰ_２の具体的な値が与えられていれば、処理はステップＳ４０５に移行する。 Side effects judgment and learning unit 108 in step S403, either specific value of P ₂ is given as the second argument, or is a second argument to determine whether it is NULL. If the second argument is NULL, the process proceeds to step S404. On the other hand, if it be a specific value of P ₂ is given as the second argument, the process proceeds to step S405.

ステップＳ４０４は、具体的に１つだけ引数として値が与えられた場合のためのステップである。この場合、副作用判定・学習部１０８は、当該１つの値（すなわちＰ_１の値）自体を、正規化された値として返す。そして点数正規化処理は終了する。 Step S404 is a step for a case where a value is given as one argument. In this case, side effects judgment and learning unit 108 returns the one value (i.e. the value of P ₁₎ itself, as a normalized value. Then, the score normalization process ends.

例えば、図６の処理において、ある１つの選択キーワードのみがステップＳ２０６で一致の基準を満たすと判断された場合は、図１０のステップＳ４０４が実行されることになる。よって、正規化された点数は、当該１つの選択キーワードに関して得られた点数そのものである。 For example, in the process of FIG. 6, if it is determined that only one selected keyword satisfies the matching criterion in step S206, step S404 of FIG. 10 is executed. Therefore, the normalized score is the score itself obtained for the one selected keyword.

また、ステップＳ４０５で副作用判定・学習部１０８は、第３引数としてＰ_３の具体的な値が与えられているか、それとも第３引数がＮＵＬＬであるかを判断する。第３引数がＮＵＬＬであれば、処理はステップＳ４０６に移行する。他方、第３引数としてＰ_３の具体的な値が与えられていれば、処理はステップＳ４０７に移行する。 Furthermore, side effects judgment and learning unit 108 in step S405, either a specific value of P ₃ is given as the third argument, or a third argument to determine whether it is NULL. If the third argument is NULL, the process proceeds to step S406. On the other hand, if it be a specific value of P ₃ is given as the third argument, the process proceeds to step S407.

ステップＳ４０６は、具体的に２つだけ引数として値が与えられた場合のためのステップである。この場合、副作用判定・学習部１０８は、当該２つの値（すなわちＰ_１とＰ_２の値）を正規化した値を下記の式（７）にしたがって算出し、算出した値を返す。そして点数正規化処理は終了する。なお、式（７）及び後述の式（８）の左辺のＰ_Ｒは、正規化された点数を示す。

Step S406 is a step for the case where only two values are given as arguments. In this case, side effects judgment and learning unit 108, the value of the two values (i.e. values of P ₁ and P ₂₎ and normalized calculated according to the following formula (7), and returns the calculated value. Then, the score normalization process ends. Incidentally, the left side of the P _R of formula (7) and below (8) shows the normalized scores.

他方、ステップＳ４０７は、３つの引数すべてに対して具体的に値が指定された場合のためのステップである。この場合、副作用判定・学習部１０８は、３つの値（すなわちＰ_１とＰ_２とＰ_３の値）を正規化した値を下記の式（８）にしたがって算出し、算出した値を返す。そして点数正規化処理は終了する。

On the other hand, step S407 is a step for the case where values are specifically specified for all three arguments. In this case, side effects judgment and learning unit 108, the three values (i.e. values of P ₁ and P ₂ and P ₃₎ a value obtained by normalizing the calculated according to the following equation (8), and returns the calculated value. Then, the score normalization process ends.

さて、以上のとおり図１〜１０を参照して説明した副作用処理部１０２による処理では、学習結果テーブル２０４が使われる。そして、学習結果テーブル２０４の一部のフィールドは、前述したとおり、本実施形態では前処理によって予め学習される。そこで、以下では前処理について説明する。 In the processing by the side effect processing unit 102 described with reference to FIGS. 1 to 10 as described above, the learning result table 204 is used. In addition, as described above, some fields of the learning result table 204 are learned in advance by preprocessing in the present embodiment. Therefore, the preprocessing will be described below.

図１の判定装置１００の通常運用の開始前には、例えば、次の（ｃ１）〜（ｃ４）のような一連の前処理が行われる。また、判定装置１００の通常運用開始後に添付文書２０２の更新又は追加が生じる場合には、次の（ｃ５）のような前処理が行われる。 Prior to the start of normal operation of the determination apparatus 100 in FIG. 1, for example, a series of preprocessing such as the following (c1) to (c4) is performed. In addition, when the attached document 202 is updated or added after the normal operation of the determination apparatus 100 is started, the following pre-processing (c5) is performed.

（ｃ１）格納部１０１に添付文書群２０１を格納する処理。
（ｃ２）利用可能な同義語辞書２０３のデータがあれば、当該データを持つ同義語辞書２０３を格納部１０１に格納し、利用可能な同義語辞書２０３のデータがなければ、何もエントリを持たない初期状態の同義語辞書２０３を格納部１０１上に作成する処理。
（ｃ３）何もエントリを持たない初期状態の学習結果テーブル２０４を格納部１０１上に作成する処理。
（ｃ４）図１１のフローチャートにしたがって、学習結果テーブル２０４にエントリを追加し、既知副作用リスト以外のフィールドについての学習を行う処理。
（ｃ５）図１４のフローチャートにしたがって、添付文書２０２の更新又は追加を行い、その更新又は追加に応じて学習結果テーブル２０４の学習を行う処理。 (C1) Processing for storing the attached document group 201 in the storage unit 101.
(C2) If there is data in the synonym dictionary 203 that can be used, the synonym dictionary 203 having the data is stored in the storage unit 101. If there is no data in the synonym dictionary 203 that can be used, there is no entry. A process of creating a synonym dictionary 203 in an initial state on the storage unit 101.
(C3) Processing for creating an initial learning result table 204 having no entry on the storage unit 101.
(C4) Processing for adding an entry to the learning result table 204 and learning for fields other than the known side effect list according to the flowchart of FIG.
(C5) A process of updating or adding the attached document 202 according to the flowchart of FIG. 14 and learning the learning result table 204 in accordance with the update or addition.

なお、上記（ｃ１）〜（ｃ３）の前処理は、例えばシステム管理者によって行われてもよいし、前処理制御部１１３により行われてもよい。以下では、前処理制御部１１３による制御にしたがって行われる上記（ｃ４）と（ｃ５）の処理について、図１１〜１４を参照して詳しく説明する。 Note that the preprocessing (c1) to (c3) may be performed by, for example, a system administrator, or may be performed by the preprocessing control unit 113. Hereinafter, the processes (c4) and (c5) performed in accordance with the control by the preprocessing control unit 113 will be described in detail with reference to FIGS.

図１１は、通常運用の開始前に判定装置１００が行う上記（ｃ４）の前処理のフローチャートである。
ステップＳ５０１で前処理制御部１１３は、例えば添付文書２０２のファイル数を数えることによって添付文書２０２が登録されている医薬品の数を求め、求めた数を変数Ｎに代入して記憶する。 FIG. 11 is a flowchart of the preprocessing (c4) performed by the determination apparatus 100 before the start of normal operation.
In step S501, the preprocessing control unit 113 obtains the number of medicines in which the attached document 202 is registered by counting the number of files of the attached document 202, for example, and stores the obtained number in a variable N.

次のステップＳ５０２で前処理制御部１１３は、添付文書２０２が登録されている医薬品について順番に注目していくための変数ｉの値を１に初期化する。なお、以下では説明の簡単化のため、ｉ番目の医薬品の添付文書２０２のことを単に「ｉ番目の添付文書２０２」という。 In the next step S502, the preprocessing control unit 113 initializes the value of the variable i for paying attention in order to the medicines for which the attached document 202 is registered to 1. In the following, for simplification of description, the i-th medicine package 202 is simply referred to as “i-th package 202”.

続いて、ステップＳ５０３で前処理制御部１１３は、ｉ番目の医薬品についてのエントリを学習結果テーブル２０４に追加する。
図１に関して説明したように、本実施形態では、添付文書２０２のファイル名が医薬品のＩＤを含むことで添付文書２０２が医薬品のＩＤと対応づけられている。よって、前処理制御部１１３は、添付文書群２０１の中からｉ番目の添付文書２０２を選び、選んだ添付文書２０２のファイル名から当該添付文書２０２に対応するＩＤを認識する。そして、前処理制御部１１３は、認識したＩＤを「ＩＤ」フィールドに設定し、かつ他のフィールドを空に初期化したエントリを、学習結果テーブル２０４に追加する。 Subsequently, in step S 503, the preprocessing control unit 113 adds an entry for the i-th medicine to the learning result table 204.
As described with reference to FIG. 1, in the present embodiment, the file name of the attached document 202 includes the ID of the medicine, so that the attached document 202 is associated with the ID of the medicine. Therefore, the preprocessing control unit 113 selects the i-th attached document 202 from the attached document group 201 and recognizes the ID corresponding to the attached document 202 from the file name of the selected attached document 202. Then, the preprocessing control unit 113 sets the recognized ID in the “ID” field and adds an entry in which other fields are initialized to be empty to the learning result table 204.

そして、次のステップＳ５０４で前処理制御部１１３は、効能・効果キーワード抽出部１１１に対して、ｉ番目の添付文書２０２の「効能又は効果」セクションからキーワードを抽出して学習結果テーブル２０４に登録するよう命令する。そして、効能・効果キーワード抽出部１１１は命令にしたがってキーワード抽出と学習結果テーブル２０４への登録を行う。 In the next step S504, the preprocessing control unit 113 extracts the keyword from the “efficacy or effect” section of the i-th attached document 202 and registers it in the learning result table 204 with respect to the effect / effect keyword extraction unit 111. Order to do. Then, the efficacy / effect keyword extraction unit 111 performs keyword extraction and registration in the learning result table 204 according to the command.

なお、図３のステップＳ１０４における副作用キーワード抽出部１０７によるキーワード抽出と同様に、効能・効果キーワード抽出部１１１がキーワードの抽出に用いるアルゴリズムは、実施形態に応じて様々でよい。 Similar to the keyword extraction by the side effect keyword extraction unit 107 in step S104 of FIG. 3, the algorithm used by the efficacy / effect keyword extraction unit 111 for keyword extraction may vary depending on the embodiment.

例えば、効能・効果キーワード抽出部１１１は、ｉ番目の添付文書２０２の「効能又は効果」セクションを形態素解析し、名詞の１つ以上の連なりをキーワードとして抽出してもよい。効能・効果キーワード抽出部１１１は、形態素解析の結果に対してさらに構文解析を行い、構文解析の結果を使ってキーワードを抽出してもよい。 For example, the efficacy / effect keyword extraction unit 111 may perform morphological analysis on the “efficacy or effect” section of the i-th attached document 202 and extract one or more series of nouns as keywords. The effect / effect keyword extraction unit 111 may further perform a syntax analysis on the result of the morphological analysis and extract a keyword using the result of the syntax analysis.

あるいは、効能・効果キーワード抽出部１１１は、漢字、カタカナ又は英字の連なりをキーワードとして抽出するといったような、字種に基づく簡易的なキーワード抽出処理を行ってもよい。また、効能又は効果として記載されうる医学用語の辞書のデータが利用可能であれば、効能・効果キーワード抽出部１１１は、ｉ番目の添付文書２０２の「効能又は効果」セクションから、辞書のエントリと一致する文字列をキーワードとして抽出してもよい。 Alternatively, the efficacy / effect keyword extraction unit 111 may perform a simple keyword extraction process based on the character type, such as extracting kanji, katakana or a sequence of English letters as a keyword. In addition, if medical dictionary data that can be described as an effect or effect is available, the effect / effect keyword extraction unit 111 reads the dictionary entry from the “effect or effect” section of the i-th attached document 202. A matching character string may be extracted as a keyword.

いずれにせよ、ステップＳ５０４において効能・効果キーワード抽出部１１１は、ｉ番目の添付文書２０２の「効能又は効果」セクションから、実施形態に応じた適宜のアルゴリズムにしたがってキーワードを抽出する。そして、効能・効果キーワード抽出部１１１は、ｉ番目の医薬品に対応してステップＳ５０３で追加された学習結果テーブル２０４のエントリの効能・効果キーワード群のフィールドに、抽出したキーワードを登録する。なお、前処理制御部１１３が効能・効果キーワード抽出部１１１にｉ番目の医薬品のＩＤを通知することで、効能・効果キーワード抽出部１１１は、キーワード抽出の対象の添付文書２０２及びキーワードの登録先のエントリを認識することができる。 In any case, in step S504, the effect / effect keyword extraction unit 111 extracts keywords from the “effect or effect” section of the i-th attached document 202 according to an appropriate algorithm according to the embodiment. Then, the efficacy / effect keyword extraction unit 111 registers the extracted keyword in the field of the efficacy / effect keyword group of the entry of the learning result table 204 added in step S503 corresponding to the i-th medicine. The pre-processing control unit 113 notifies the efficacy / effect keyword extraction unit 111 of the ID of the i-th medicine, so that the efficacy / effect keyword extraction unit 111 can add the attached document 202 and the keyword registration destination of the keyword extraction target. Can be recognized.

そして、次のステップＳ５０５で前処理制御部１１３は、副作用キーワード抽出部１０７に対して、ｉ番目の添付文書２０２の「副作用」セクションからキーワードを抽出して学習結果テーブル２０４に登録するよう命令する。そして、副作用キーワード抽出部１０７は命令にしたがってキーワード抽出と学習結果テーブル２０４への登録を行う。 In the next step S505, the preprocessing control unit 113 instructs the side effect keyword extraction unit 107 to extract a keyword from the “side effect” section of the i-th attached document 202 and register it in the learning result table 204. . Then, the side effect keyword extraction unit 107 performs keyword extraction and registration in the learning result table 204 according to the command.

なお、ステップＳ５０５におけるキーワード抽出のアルゴリズムも、図３のステップＳ１０４と同様、実施形態に応じて様々でよい。また、前処理制御部１１３が副作用キーワード抽出部１０７にｉ番目の医薬品のＩＤを通知することで、副作用キーワード抽出部１０７は、キーワード抽出の対象の添付文書２０２及びキーワードの登録先のエントリを認識することができる。 Note that the keyword extraction algorithm in step S505 may be varied depending on the embodiment, as in step S104 in FIG. Further, the pre-processing control unit 113 notifies the side effect keyword extraction unit 107 of the ID of the i-th medicine, so that the side effect keyword extraction unit 107 recognizes the attached document 202 to be subjected to keyword extraction and the entry of the keyword registration destination. can do.

次のステップＳ５０６で前処理制御部１１３は、変数ｉの値を１増やす。そして、処理はステップＳ５０７に移行する。なお、上記のステップＳ５０４とＳ５０５の順序は逆でもよい。 In the next step S506, the preprocessing control unit 113 increases the value of the variable i by one. Then, the process proceeds to step S507. Note that the order of steps S504 and S505 may be reversed.

そして、ステップＳ５０７で前処理制御部１１３は、変数ｉの値が変数Ｎの値を超えているか否かを判断する。変数ｉの値が変数Ｎの値を超えていれば、添付文書群２０１に含まれるすべての添付文書２０２についてステップＳ５０３〜Ｓ５０５の処理が終了しているので、処理はステップＳ５０８に移行する。逆に、変数ｉの値が変数Ｎの値以下ならば、まだステップＳ５０３〜Ｓ５０５の処理を行っていない添付文書２０２が残っているので処理はステップＳ５０３に戻る。 In step S507, the preprocessing control unit 113 determines whether the value of the variable i exceeds the value of the variable N. If the value of the variable i exceeds the value of the variable N, the processing of steps S503 to S505 has been completed for all the attached documents 202 included in the attached document group 201, and the processing moves to step S508. On the contrary, if the value of the variable i is equal to or less than the value of the variable N, since the attached document 202 that has not been subjected to the processes of steps S503 to S505 remains, the process returns to step S503.

ステップＳ５０８で前処理制御部１１３は、変数ｉの値を再度１に初期化する。そして処理はステップＳ５０９に移行する。
ステップＳ５０９で前処理制御部１１３は、類薬判定・学習部１１２に対して、ｉ番目の医薬品の類薬を学習するよう命令し、類薬判定・学習部１１２はｉ番目の医薬品の類薬を学習する。 In step S508, the preprocessing control unit 113 initializes the value of the variable i to 1 again. Then, the process proceeds to step S509.
In step S509, the preprocessing control unit 113 instructs the similar drug determination / learning unit 112 to learn the similar drug of the i-th drug, and the similar drug determination / learning unit 112 determines the similar drug of the i-th drug. To learn.

なお、本実施形態では、類薬の関係は対称的である。すなわち、任意のｉとｊに関して、ｊ番目の医薬品がｉ番目の医薬品の類薬であれば、ｉ番目の医薬品はｊ番目の医薬品の類薬である。よって、ステップＳ５０９で前処理制御部１１３は、類薬判定・学習部１１２に対して、類薬を学習する対象であるｉ番目の医薬品のＩＤを指定し、（ｉ＋１）番目からＮ番目の医薬品を、ｉ番目の医薬品の類薬であるか否かを調べるための比較範囲として指定する。 In the present embodiment, the relationship between the analogs is symmetric. That is, for any i and j, if the j-th drug is an i-th drug analog, the i-th drug is a j-th drug analog. Therefore, in step S509, the preprocessing control unit 113 designates the ID of the i-th drug that is the target of learning the similar drug to the similar drug determination / learning unit 112, and the (i + 1) th to N-th drug Is designated as a comparison range for examining whether or not it is an analog of the i-th drug.

ステップＳ５０９における類薬判定・学習部１１２の動作の詳細は図１２とともに後述するが、概略を述べれば次のとおりである。
すなわち、類薬判定・学習部１１２は、前処理制御部１１３から指定された比較範囲に含まれる各医薬品がｉ番目の医薬品の類薬か否かを判断する。そして、ｉ番目の医薬品の類薬が見つかれば、類薬判定・学習部１１２は、見つかった類薬のＩＤを、ｉ番目の医薬品に対応する学習結果テーブル２０４のエントリの類薬リストに追加する。また、類薬判定・学習部１１２は、見つかった類薬に対応する学習結果テーブル２０４のエントリの類薬リストに、ｉ番目の医薬品のＩＤを追加する。 Details of the operation of the analogy drug determination / learning unit 112 in step S509 will be described later with reference to FIG. 12, but the outline is as follows.
In other words, the similar drug determination / learning unit 112 determines whether each drug included in the comparison range designated by the preprocessing control unit 113 is similar to the i-th drug. Then, if the i-th drug similar drug is found, the similar drug determination / learning unit 112 adds the ID of the found similar drug to the similar drug list of the entry of the learning result table 204 corresponding to the i-th drug. . Further, the similar drug determination / learning unit 112 adds the ID of the i-th drug to the similar drug list of the entry of the learning result table 204 corresponding to the found similar drug.

その後、ステップＳ５１０で前処理制御部１１３は、変数ｉの値を１増やす。そして、処理はステップＳ５１１に移行する。
ステップＳ５１１で前処理制御部１１３は、変数ｉの値が変数Ｎの値以上であるか否かを判断する。変数ｉの値が変数Ｎの値以上であれば、添付文書２０２が登録されているすべての医薬品について類薬の学習が済んだということなので、図１１の前処理も終了する。他方、変数ｉの値が変数Ｎの値未満であれば、処理はステップＳ５０９に戻る。 Thereafter, in step S510, the preprocessing control unit 113 increases the value of the variable i by one. Then, the process proceeds to step S511.
In step S511, the preprocessing control unit 113 determines whether the value of the variable i is greater than or equal to the value of the variable N. If the value of the variable i is equal to or greater than the value of the variable N, it means that learning of similar drugs has been completed for all the drugs for which the attached document 202 is registered, so the preprocessing of FIG. On the other hand, if the value of variable i is less than the value of variable N, the process returns to step S509.

図１２は、類薬判定・学習部１１２が図１１のステップＳ５０９と後述の図１４のステップＳ８０８で行う類薬学習処理のフローチャートである。本実施形態では、前処理制御部１１３が類薬判定・学習部１１２に対して、類薬を学習する対象の医薬品（以下「学習対象薬」という）のＩＤ（以下「学習対象薬ＩＤ」という）を指定する。また、前処理制御部１１３は、学習対象薬の類薬か否かの比較を行うための比較範囲の下限と上限も類薬判定・学習部１１２に対して指定する。 FIG. 12 is a flowchart of the similar drug learning process performed by the similar drug determination / learning unit 112 in step S509 in FIG. 11 and step S808 in FIG. In the present embodiment, the pre-processing control unit 113 tells the similar drug determination / learning unit 112 the ID (hereinafter referred to as “learning target drug ID”) of the target drug for learning similar drugs (hereinafter referred to as “learning target drug”) ) Is specified. In addition, the preprocessing control unit 113 also designates the lower limit and upper limit of the comparison range for comparing whether or not the learning target drug is similar to the similar drug determination / learning unit 112.

ステップＳ６０１で類薬判定・学習部１１２は、学習対象薬について学習済みの内容を読み込んで、「類薬学習リスト」として記憶する。すなわち、類薬判定・学習部１１２は、前処理制御部１１３から指定された学習対象薬ＩＤをＩＤとして有するエントリを学習結果テーブル２０４において検索し、検索の結果見つかったエントリの類薬リストを読み込み、類薬学習リストとして記憶する。 In step S 601, the similar drug determination / learning unit 112 reads the learned content of the learning target drug and stores it as a “similar drug learning list”. That is, the similar drug determination / learning unit 112 searches the learning result table 204 for an entry having the learning target drug ID designated by the preprocessing control unit 113 as an ID, and reads the similar drug list of the entries found as a result of the search. , Remember as a similar medicine learning list.

また、ステップＳ６０２で類薬判定・学習部１１２は、「類薬候補リスト」を空に初期化する。
そして、ステップＳ６０３で類薬判定・学習部１１２は、前処理制御部１１３から指定された比較範囲内で未処理の医薬品が残っているか否かを判断する。指定された範囲内でまだ学習対象薬の類薬か否かの判断がされていない医薬品が残っていれば、処理はステップＳ６０４に移行する。逆に、指定された範囲内のすべての医薬品について学習対象薬の類薬か否かが判断済みであれば、処理はステップＳ６１７に移行する。 In step S602, the similar drug determination / learning unit 112 initializes the “drug candidate list” to be empty.
In step S 603, the similar medicine determination / learning unit 112 determines whether or not an unprocessed medicine remains within the comparison range designated by the preprocessing control unit 113. If there remains a medicine that has not yet been determined whether it is a similar drug to be learned within the specified range, the process proceeds to step S604. On the other hand, if it is already determined whether or not all the medicines in the designated range are similar medicines to be learned, the process proceeds to step S617.

ステップＳ６０４で類薬判定・学習部１１２は、前処理制御部１１３から指定された比較範囲内で未処理の医薬品を１つ選択する。以下、ステップＳ６０４で選択された医薬品を「選択薬」といい、そのＩＤを「選択薬ＩＤ」という。 In step S 604, the similar medicine determination / learning unit 112 selects one unprocessed medicine within the comparison range designated by the preprocessing control unit 113. Hereinafter, the drug selected in step S604 is referred to as “selected drug”, and its ID is referred to as “selected drug ID”.

そして、次のステップＳ６０５で類薬判定・学習部１１２は、選択薬ＩＤと学習対象薬ＩＤが等しいか否かを判断する。選択薬ＩＤと学習対象薬ＩＤが等しいとき、学習対象薬自身が学習対象薬の類薬か否かを調べる必要はないので、処理はステップＳ６０３に戻る。逆に、選択薬ＩＤと学習対象薬ＩＤが異なる場合は、処理はステップＳ６０６に移行する。 In step S605, the similar drug determination / learning unit 112 determines whether the selected drug ID and the learning target drug ID are equal. When the selected drug ID and the learning target drug ID are equal, there is no need to check whether or not the learning target drug itself is an analog of the learning target drug, and the process returns to step S603. Conversely, if the selected drug ID and the learning target drug ID are different, the process proceeds to step S606.

ステップＳ６０６で類薬判定・学習部１１２は、選択薬ＩＤが類薬学習リストに含まれるか否かを判断する。
選択薬ＩＤが類薬学習リストに含まれる場合、選択薬は学習対象薬の類薬として学習済みであり、学習結果テーブル２０４に既に登録されているので、選択薬についてこれ以上の処理を行う必要はない。そこで、処理はステップＳ６０３に戻る。逆に、選択薬ＩＤが類薬学習リストに含まれない場合は、処理はステップＳ６０７に移行する。 In step S606, the similar medicine determination / learning unit 112 determines whether or not the selected medicine ID is included in the similar medicine learning list.
When the selected drug ID is included in the similar drug learning list, the selected drug has already been learned as the similar drug of the learning target drug and has already been registered in the learning result table 204, so it is necessary to perform further processing on the selected drug There is no. Therefore, the process returns to step S603. Conversely, if the selected drug ID is not included in the analogy medicine learning list, the process proceeds to step S607.

ステップＳ６０７で類薬判定・学習部１１２は、学習対象薬と選択薬それぞれの添付文書２０２における「薬効分類名」セクションを参照し、比較する。上記のとおり、「薬効分類名」セクションには、「解熱鎮痛消炎剤」などの名称が１つ以上記載されているので、類薬判定・学習部１１２は、学習対象薬と選択薬で一致する薬効分類の名称があるか否かを確認する。 In step S 607, the similar drug determination / learning unit 112 refers to the “medicinal effect classification name” section in the attached document 202 of the learning target drug and the selected drug, and compares them. As described above, since one or more names such as “antipyretic analgesic / anti-inflammatory agent” are described in the “medicinal effect classification name” section, the similar drug determination / learning unit 112 matches the learning target drug with the selected drug. Check if there is a medicinal classification name.

そして、次のステップＳ６０８で類薬判定・学習部１１２は、ステップＳ６０７の比較の結果、学習対象薬と選択薬で一致する薬効分類の名称があったか否かを判断する。学習対象薬と選択薬で一致する薬効分類の名称があった場合は、学習対象薬と選択薬は類薬である。よって、選択薬を学習対象薬の類薬として学習するために、処理はステップＳ６０９に移行する。逆に、学習対象薬と選択薬で一致する薬効分類の名称がなかった場合は、他のセクションの記述に基づいて選択薬が学習対象薬の類薬か否かをさらに調査するために、処理はステップＳ６１０に移行する。 In step S608, the analogy drug determination / learning unit 112 determines whether there is a medicinal effect classification name that matches the learning target drug and the selected drug as a result of the comparison in step S607. When there is a name of a medicinal effect classification that matches the learning target drug and the selected drug, the learning target drug and the selected drug are similar drugs. Therefore, the process proceeds to step S609 in order to learn the selected drug as an analog of the learning target drug. Conversely, if there is no matching drug class name for the target drug and the selected drug, the process is performed to further investigate whether the selected drug is an analog of the target drug based on the descriptions in other sections. Proceeds to step S610.

ステップＳ６０９で類薬判定・学習部１１２は、類薬学習リストに選択薬ＩＤを追加する。そして、処理はステップＳ６０３に戻る。
また、ステップＳ６１０で類薬判定・学習部１１２は、学習対象薬と選択薬それぞれの添付文書２０２における「有効成分に関する理化学的知見」セクションを参照し、比較する。 In step S609, the similar drug determination / learning unit 112 adds the selected drug ID to the similar drug learning list. Then, the process returns to step S603.
In step S610, the similar drug determination / learning unit 112 refers to and compares the “physical and chemical knowledge regarding active ingredients” section in the package insert 202 of each of the learning target drug and the selected drug.

上記のように、「有効成分に関する理化学的知見」セクションには、一般名、化学名、分子式及び構造式が含まれる。また、複数の有効成分を含む医薬品の場合、複数の有効成分それぞれの一般名、化学名、分子式及び構造式が記載されている。 As mentioned above, the “physicochemical findings regarding active ingredients” section includes generic names, chemical names, molecular formulas and structural formulas. Moreover, in the case of a pharmaceutical containing a plurality of active ingredients, the general name, chemical name, molecular formula and structural formula of each of the plurality of active ingredients are described.

そこで、本実施形態では、類薬判定・学習部１１２は、学習対象薬と選択薬それぞれの添付文書２０２における「有効成分に関する理化学的知見」セクションから一般名と化学名を抽出し、一般名同士と化学名同士を比較する。 Therefore, in this embodiment, the similar drug determination / learning unit 112 extracts a common name and a chemical name from the “physical and chemical knowledge about active ingredients” section in the package insert 202 of each of the learning target drug and the selected drug. Compare chemical names with each other.

そして、ステップＳ６１１で類薬判定・学習部１１２は、学習対象薬と選択薬の間で一致する一般名又は化学名があるか否かを判断する。もし、学習対象薬と選択薬の間で一致する一般名又は化学名があった場合は、学習対象薬と選択薬は類薬である。よって、選択薬を学習対象薬の類薬として学習するために、処理はステップＳ６１２に移行する。逆に、学習対象薬と選択薬で一般名にも化学名にも一致するものが見つからなかった場合は、他のセクションの記述に基づいて選択薬が学習対象薬の類薬か否かをさらに調査するために、処理はステップＳ６１３に移行する。 In step S611, the similar drug determination / learning unit 112 determines whether there is a common name or chemical name that matches between the learning target drug and the selected drug. If there is a common name or chemical name between the learning target drug and the selected drug, the learning target drug and the selected drug are similar drugs. Therefore, the process proceeds to step S612 in order to learn the selected drug as an analog of the learning target drug. On the other hand, if no match is found between the learning target drug and the selected drug, the generic name and the chemical name are not found. In order to investigate, the process proceeds to step S613.

なお、添付文書２０２内で構造式が所定のマークアップ言語によって記載されている実施形態においては、類薬判定・学習部１１２は、ステップＳ６１０においてさらに構造式同士を比較してもよい。そして、ステップＳ６１１で類薬判定・学習部１１２は、構造式同士が学習対象薬と選択薬の間で一致していれば選択薬を学習対象薬の類薬と判断し、続いてステップＳ６１２を実行してもよい。 In the embodiment in which the structural formula is described in a predetermined markup language in the attached document 202, the analogy medicine determination / learning unit 112 may further compare the structural formulas in step S610. Then, in step S611, the similar drug determination / learning unit 112 determines that the selected drug is the similar drug of the learning target drug if the structural formulas match between the learning target drug and the selected drug, and then performs step S612. May be executed.

ステップＳ６１２で類薬判定・学習部１１２は、類薬学習リストに選択薬ＩＤを追加する。そして、処理はステップＳ６０３に戻る。
また、ステップＳ６１３で類薬判定・学習部１１２は、図１３に「効能・効果類似度算出処理」として示す処理を行う。すなわち、類薬判定・学習部１１２は、学習対象薬と選択薬それぞれについて既に添付文書２０２の「効能又は効果」セクションから学習して学習結果テーブル２０４に登録した効能・効果キーワード群同士の類似度を求める。 In step S612, the similar drug determination / learning unit 112 adds the selected drug ID to the similar drug learning list. Then, the process returns to step S603.
In step S613, the analogy drug determination / learning unit 112 performs a process shown in FIG. 13 as “efficacy / effect similarity calculation process”. That is, the similar drug determination / learning unit 112 has already learned from the “efficacy or effect” section of the attached document 202 for each of the learning target drug and the selected drug, and the similarity between the effect / effect keyword groups registered in the learning result table 204. Ask for.

なお、ステップＳ６１３の詳細は図１３とともに後述するが、ステップＳ６１３では、Term Frequency-Inverse Document Frequency（ＴＦ・ＩＤＦ）値を使ったベクトル空間モデルによる一般的な文書間の類似度算出とは異なるアルゴリズムが使われる。すなわち、図１３の処理では、効能又は効果に基づいて２つの医薬品同士が類薬か否かを判断するという目的に合わせたアルゴリズムが採用されている。図１３の処理によれば、医学用語では共通の単語を含む複合語が大量にあることと、２つの医薬品の間で効能又は効果の全体が類似していなくても一部が一致していれば２つの医薬品が類薬である蓋然性が高いことが考慮される。その結果、図１３の処理によれば、類薬の判定に適した類似度が得られる。 The details of step S613 will be described later with reference to FIG. 13, but in step S613, an algorithm different from general document similarity calculation using a vector space model using term frequency-inverse document frequency (TF / IDF) values is used. Is used. That is, in the process of FIG. 13, an algorithm is adopted in accordance with the purpose of determining whether two medicines are similar or not based on efficacy or effect. According to the processing of FIG. 13, in medical terms, there are a large number of compound words including common words, and even if the efficacy or overall effect is not similar between the two medicines, a part of them may match. For example, it is considered that two drugs are highly likely to be similar drugs. As a result, according to the process of FIG. 13, a similarity suitable for determination of an analog is obtained.

そして、次のステップＳ６１４で類薬判定・学習部１１２は、ステップＳ６１３で求めた類似度が、「γ_１以上」、「γ_２以上γ_１未満」、「γ_２未満」のうちどの範囲に該当するかを判断する。なお、本実施形態においてγ_１とγ_２は、予め決められた適宜の閾値であり、γ_１＞γ_２である。 Then, in the next step S614, the analogy drug determination / learning unit 112 has the similarity calculated in step S613 in any range of “γ ₁ or more”, “γ ₂ or more and less than γ ₁ ”, and “less than γ ₂ ”. Determine if it applies. In the present embodiment, γ ₁ and γ ₂ are predetermined threshold values, and γ ₁ > γ ₂ .

なお、本実施形態における閾値γ_１とγ_２は固定された値である。しかし、実施形態によっては、閾値γ_１とγ_２は、学習対象薬と選択薬について学習結果テーブル２０４に学習されている効能・効果キーワード群に含まれるキーワードの数又は長さに応じて変化するように決められた値であってもよい。 Note that the threshold values γ ₁ and γ ₂ in the present embodiment are fixed values. However, in some embodiments, the thresholds γ ₁ and γ ₂ change according to the number or length of keywords included in the efficacy / effect keyword group learned in the learning result table 204 for the learning target drug and the selected drug. It may be a value determined in this way.

閾値γ_１は、「学習対象薬と選択薬が類薬同士である」と判断するのが妥当であることを示す基準値である。また、閾値γ_２は、「学習対象薬と選択薬は類薬同士ではない」と判断するか「学習対象薬と選択薬は類薬同士の可能性がある」と判断するかの境界を示す基準値である。 The threshold value γ ₁ is a reference value indicating that it is appropriate to determine that “the learning target drug and the selected drug are similar drugs”. In addition, the threshold γ ₂ indicates a boundary for determining that “the learning target drug and the selected drug are not similar drugs” or “the learning target drug and the selected drug may be similar drugs”. This is the reference value.

ステップＳ６１３で求めた類似度がγ_１以上の場合、処理はステップＳ６１４からステップＳ６１５に移行する。また、ステップＳ６１３で求めた類似度がγ_２以上γ_１未満の場合、処理はステップＳ６１４からステップＳ６１６に移行する。そして、ステップＳ６１３で求めた類似度がγ_２未満の場合、処理はステップＳ６１４からステップＳ６０３に戻る。 When the similarity is _one or more γ obtained in step S613, the process proceeds from step S614 to step S615. Also, similarity obtained in step S613 is a case of _two or more gamma less than ₁ gamma, the process proceeds from step S614 to step S616. When the similarity obtained in step S613 is less than gamma _2, the process returns from step S614 to step S603.

ステップＳ６１５で類薬判定・学習部１１２は、類薬学習リストに選択薬ＩＤを追加する。つまり、類薬判定・学習部１１２は、選択薬を学習対象薬の類薬として記憶する。そして、処理はステップＳ６０３に戻る。 In step S615, the similar drug determination / learning unit 112 adds the selected drug ID to the similar drug learning list. That is, the similar drug determination / learning unit 112 stores the selected drug as the similar drug of the learning target drug. Then, the process returns to step S603.

また、ステップＳ６１６で類薬判定・学習部１１２は、類薬候補リストに選択薬ＩＤを追加する。つまり、類薬判定・学習部１１２は、選択薬を学習対象薬の類薬の可能性がある医薬品として記憶する。そして、処理はステップＳ６０３に戻る。 In step S616, the similar drug determination / learning unit 112 adds the selected drug ID to the similar drug candidate list. That is, the similar drug determination / learning unit 112 stores the selected drug as a drug that may be a similar drug to be learned. Then, the process returns to step S603.

以上のようにして、類薬判定・学習部１１２が、前処理制御部１１３から指定された比較範囲内に含まれる各医薬品についてステップＳ６０４以降の処理を実行することで、学習対象薬に関する類薬学習リストと類薬候補リストが完成する。そして、類薬学習リストと類薬候補リストが完成すると、上記のとおり処理はステップＳ６０３からステップＳ６１７へと移行する。 As described above, the similar drug determination / learning unit 112 executes the process from step S604 onward for each drug included in the comparison range specified by the preprocessing control unit 113, whereby the similar drug related to the learning target drug. The learning list and candidate list are completed. When the similar drug learning list and the similar drug candidate list are completed, the process proceeds from step S603 to step S617 as described above.

ステップＳ６１７で類薬判定・学習部１１２は、類薬学習リストの内容を学習結果テーブル２０４に記録する。具体的には、類薬判定・学習部１１２は、類薬学習リストに含まれる各ＩＤを、学習結果テーブル２０４においてＩＤとして学習対象薬ＩＤを持つエントリの類薬リストに追加する。さらに、類薬判定・学習部１１２は、類薬学習リストに含まれる各ＩＤについて、学習結果テーブル２０４において当該ＩＤをＩＤとして持つエントリの類薬リストに学習対象薬ＩＤを追加する。 In step S617, the similar drug determination / learning unit 112 records the contents of the similar drug learning list in the learning result table 204. Specifically, the similar drug determination / learning unit 112 adds each ID included in the similar drug learning list to the similar drug list of the entry having the learning target drug ID as an ID in the learning result table 204. Furthermore, the similar medicine determination / learning unit 112 adds, for each ID included in the similar medicine learning list, a learning target drug ID to the similar medicine list of an entry having the ID as an ID in the learning result table 204.

そして、次のステップＳ６１８で類薬判定・学習部１１２は、類薬候補リストが空か否かを判断する。
類薬候補リストが空の場合、学習対象薬の類薬か否かが不明な医薬品はない。よって、この場合、図１２の類薬学習処理も終了する。 In step S618, the similar drug determination / learning unit 112 determines whether the similar drug candidate list is empty.
When the similar drug candidate list is empty, there is no drug for which it is unknown whether it is a similar drug to be learned. Therefore, in this case, the similar medicine learning process of FIG. 12 is also terminated.

他方、類薬候補リストが空ではない場合は、学習対象薬の類薬と断定することはできないが学習対象薬の類薬の可能性がある医薬品が見つかった場合である。よって、この場合、ユーザによる判断の入力を受け付けるため、処理はステップＳ６１９に移行する。 On the other hand, if the similar drug candidate list is not empty, it is a case where a drug that cannot be determined as a similar drug as a learning target drug but that may be a similar drug as a learning target drug is found. Therefore, in this case, in order to accept a determination input by the user, the process proceeds to step S619.

ステップＳ６１９で類薬判定・学習部１１２は、類薬候補リストにＩＤが含まれる各医薬品が、学習対象薬の類薬か否かについて、ユーザからの入力を受け付ける。例えば、類薬判定・学習部１１２は、図２の出力装置３０６に相当するディスプレイに、次の（ｄ１）〜（ｄ４）を表示させてもよい。 In step S619, the similar drug determination / learning unit 112 receives an input from the user as to whether or not each drug whose ID is included in the similar drug candidate list is a similar drug to be learned. For example, the similar medicine determination / learning unit 112 may display the following (d1) to (d4) on a display corresponding to the output device 306 of FIG.

（ｄ１）学習対象薬ＩＤ。なお、類薬判定・学習部１１２は、学習対象薬の添付文書２０２へのリンクを学習対象薬ＩＤに埋め込んでもよい。また、類薬判定・学習部１１２は、学習対象薬を特定する情報として、学習対象薬ＩＤの代わりに（又は学習対象薬ＩＤとともに）、学習対象薬の販売名をディスプレイに表示させてもよい。 (D1) Learning target drug ID. Note that the similar medicine determination / learning unit 112 may embed a link to the attached document 202 of the learning target drug in the learning target drug ID. Further, the similar medicine determination / learning unit 112 may display the sales name of the learning target drug on the display instead of the learning target drug ID (or together with the learning target drug ID) as information for specifying the learning target drug. .

（ｄ２）類薬候補リストに含まれる各ＩＤ。なお、類薬判定・学習部１１２は、各ＩＤに、当該ＩＤに対応する添付文書２０２へのリンクを埋め込んでもよい。また、類薬判定・学習部１１２は、類薬候補リストにＩＤが含まれる各医薬品を特定する情報として、ＩＤの代わりに（又はＩＤとともに）、当該医薬品の販売名をディスプレイに表示させてもよい。 (D2) Each ID included in the drug candidate list. The similar medicine determination / learning unit 112 may embed a link to the attached document 202 corresponding to the ID in each ID. Further, the similar medicine determination / learning unit 112 may display the sales name of the medicine on the display instead of the ID (or together with the ID) as information specifying each medicine whose ID is included in the similar medicine candidate list. Good.

（ｄ３）類薬候補リストに含まれる各ＩＤについて、当該ＩＤの医薬品が学習対象薬の類薬か否かをそれぞれ指定するためのユーザインタフェース。より具体的には、ラジオボタン、チェックボックス、プルダウンリストなどが利用可能である。 (D3) A user interface for designating, for each ID included in the similar drug candidate list, whether the drug of the ID is a similar drug to be learned. More specifically, radio buttons, check boxes, pull-down lists, etc. can be used.

（ｄ４）入力内容を確定させるための、ボタンなどのユーザインタフェース。
例えば上記の（ｄ１）〜（ｄ４）のようなユーザインタフェースが採用される場合、類薬判定・学習部１１２は、（ｄ４）のボタンが押下されるまで待機する。そして、（ｄ４）のボタンが押下されると、類薬判定・学習部１１２は、類薬候補リストに含まれる各ＩＤについて（ｄ３）のユーザインタフェースを介して入力された判断結果を取り込む。 (D4) A user interface such as a button for confirming the input content.
For example, when the user interfaces such as (d1) to (d4) described above are employed, the analogy medicine determination / learning unit 112 waits until the button (d4) is pressed. When the button (d4) is pressed, the similar drug determination / learning unit 112 captures the determination result input via the user interface (d3) for each ID included in the similar drug candidate list.

そして、次のステップＳ６２０において、類薬判定・学習部１１２は、ステップＳ６１９で受け付けた入力結果にしたがい、学習対象薬の類薬と判定されたもののＩＤを学習結果テーブル２０４に記録する。 In step S620, the similar drug determination / learning unit 112 records the ID of the drug determined as the similar drug to be learned in the learning result table 204 according to the input result received in step S619.

すなわち、類薬判定・学習部１１２は、類薬候補リストに含まれるＩＤのうち、上記（ｄ３）のユーザインタフェースを介して取り込んだ入力内容が「類薬である」と示すものを、「類薬ＩＤ」として認識する。なお、類薬候補リストに含まれるどのＩＤも類薬ＩＤとして認識されないかもしれないし、１つのＩＤのみが類薬ＩＤとして認識されるかもしれないし、複数のＩＤが類薬ＩＤとして認識されるかもしれない。 That is, the similar drug determination / learning unit 112 indicates that the ID included in the similar drug candidate list indicates that the input content acquired through the user interface (d3) is “similar drug”. Recognized as “drug ID”. It should be noted that any ID included in the similar drug candidate list may not be recognized as the similar drug ID, only one ID may be recognized as the similar drug ID, or a plurality of IDs may be recognized as the similar drug ID. unknown.

類薬判定・学習部１１２は、類薬ＩＤを１つ以上認識した場合は、各類薬ＩＤについて、学習結果テーブル２０４においてＩＤとして学習対象薬ＩＤを持つエントリの類薬リストに当該類薬ＩＤを追加する。さらに、類薬判定・学習部１１２は、各類薬ＩＤについて、学習結果テーブル２０４において当該類薬ＩＤをＩＤとして持つエントリの類薬リストに学習対象薬ＩＤを追加する。 When one or more similar drug IDs are recognized, the similar drug determination / learning unit 112 adds the similar drug ID to the similar drug list of the entry having the learning target drug ID as an ID in the learning result table 204 for each similar drug ID. Add Furthermore, the similar drug determination / learning unit 112 adds the learning target drug ID to the similar drug list of the entry having the similar drug ID as an ID in the learning result table 204 for each similar drug ID.

なお、類薬判定・学習部１１２は、類薬候補リストに含まれるＩＤのうち、上記（ｄ３）のユーザインタフェースを介して取り込んだ入力内容が「類薬でない」と示すものについては、特に処理を行わない。そして、以上のようにして類薬判定・学習部１１２がステップＳ６２０の処理を終えると、図１２の類薬学習処理も終了する。 Note that the analogy medicine determination / learning unit 112 particularly processes the IDs included in the analogy drug candidate list whose input content captured through the user interface (d3) indicates “not an analogy”. Do not do. Then, when the similar drug determination / learning unit 112 finishes the process of step S620 as described above, the similar drug learning process of FIG. 12 is also ended.

図１３は、類薬判定・学習部１１２が図１２のステップＳ６１３で行う効能・効果類似度算出処理のフローチャートである。前述のとおり、効能・効果類似度算出処理は、前処理制御部１１３により指定される学習対象薬と、類薬判定・学習部１１２が図１２のステップＳ６０４で選択した選択薬との組み合わせごとに行われる。 FIG. 13 is a flowchart of the efficacy / effect similarity calculation process performed by the similar drug determination / learning unit 112 in step S613 of FIG. As described above, the efficacy / effect similarity calculation process is performed for each combination of the learning target drug specified by the preprocessing control unit 113 and the selected drug selected by the similar drug determination / learning unit 112 in step S604 of FIG. Done.

ステップＳ７０１で類薬判定・学習部１１２は、３つの変数Ｐ_１、Ｐ_２、Ｐ_３を初期化してＮＵＬＬとする。３つの変数Ｐ_１、Ｐ_２、Ｐ_３は、学習対象薬と選択薬それぞれの効能・効果キーワード群内のキーワード同士の組み合わせのうちで、キーワード同士の類似度が一定の基準を満たし、かつ類似度が上位３位に入るものの類似度の点数を記憶するための変数である。 In step S701, the analogy drug determination / learning unit 112 initializes the _three variables P ₁ , P ₂ , and P ₃ to NULL. The three variables P ₁ , P ₂ , and P ₃ are similar to each other in terms of the similarity between the keywords among the combinations of the keywords in the efficacy / effect keyword group of the learning target drug and the selected drug. It is a variable for storing the score of the similarity of the one whose degree is in the top three.

そして、ステップＳ７０２で類薬判定・学習部１１２は、学習対象薬の効能・効果キーワード群Ｋ_１を取得する。すなわち、類薬判定・学習部１１２は、学習結果テーブル２０４においてＩＤが学習対象薬ＩＤと一致するエントリを検索し、見つかったエントリの効能・効果キーワード群を取得する。そして、処理はステップＳ７０３に移行する。 Then, similar drugs determination and learning unit 112 in step S702 acquires Indications keywords _{K 1} of the learning control drug. That is, the similar medicine determination / learning unit 112 searches the learning result table 204 for an entry whose ID matches the learning target drug ID, and acquires the efficacy / effect keyword group of the found entry. Then, the process proceeds to step S703.

ステップＳ７０３で類薬判定・学習部１１２は、選択薬の効能・効果キーワード群Ｋ_２を取得する。すなわち、類薬判定・学習部１１２は、学習結果テーブル２０４においてＩＤが選択薬ＩＤと一致するエントリを検索し、見つかったエントリの効能・効果キーワード群を取得する。そして、処理はステップＳ７０４に移行する。なお、ステップＳ７０２とＳ７０３の実行順序は逆でもよい。 Similar drugs determination and learning unit 112 in step S703 acquires Indications keywords _{K 2} selective drugs. That is, the similar drug determination / learning unit 112 searches the learning result table 204 for an entry whose ID matches the selected drug ID, and acquires the efficacy / effect keyword group of the found entry. Then, the process proceeds to step S704. Note that the execution order of steps S702 and S703 may be reversed.

ステップＳ７０４で類薬判定・学習部１１２は、効能・効果キーワード群Ｋ_１内のキーワードと効能・効果キーワード群Ｋ_２と内のキーワードとの組み合わせで、ステップＳ７０５以降の処理を行っていない未処理のものが残っているか否かを判断する。未処理の組み合わせが残っていれば、処理はステップＳ７０５に移行し、すべての組み合わせについて処理済みならば、処理はステップＳ７１１に移行する。 Step S704 In similar drugs determination and learning unit 112, in combination with the keywords of the keyword and Indications keywords K ₂ of Indications keyword group K _1, untreated not subjected to the step S705 and subsequent steps Determine if there are any leftovers. If an unprocessed combination remains, the process proceeds to step S705. If all combinations have been processed, the process proceeds to step S711.

ステップＳ７０５で類薬判定・学習部１１２は、効能・効果キーワード群Ｋ_１内のキーワードと効能・効果キーワード群Ｋ_２と内のキーワードとの組み合わせのうち、未処理の組み合わせの１つにしたがい、キーワードＷ_１とＷ_２を選択する。つまり、類薬判定・学習部１１２は、効能・効果キーワード群Ｋ_１からキーワードＷ_１を選択し、効能・効果キーワード群Ｋ_２からキーワードＷ_２を選択する。 Step S705 In similar drugs determination and learning unit 112, of the combination of the keyword of the keyword and Indications keywords K ₂ of Indications keyword group K _1, follow the one of the combinations of raw, to select a keyword _{W 1} and _{W 2.} That is, the similar medicine determination / learning unit 112 selects the keyword W ₁ from the efficacy / effect keyword group K ₁ and selects the keyword W ₂ from the efficacy / effect keyword group K ₂ .

そして、次のステップＳ７０６で類薬判定・学習部１１２は、キーワードＷ_１とＷ_２の類似度をキーワード類似度評価部１０９に評価させ、評価結果の点数を得る。ステップＳ７０６でキーワード類似度評価部１０９が行う点数計算処理は、引数の内容以外は、図６の副作用類似度算出処理のステップＳ２０５に関してキーワード類似度評価部１０９が図７のフローチャートにしたがって行う点数計算処理と同じである。よって、ここでは点数計算処理についての詳しい説明を省略するが、１つ例を挙げれば次のとおりである。 Then, similar drugs determination and learning unit 112 at the next step S706 causes the evaluated similarity keyword W ₁ and W ₂ in the keyword similarity evaluation unit 109, to obtain a score of evaluation results. The score calculation processing performed by the keyword similarity evaluation unit 109 in step S706 is score calculation performed by the keyword similarity evaluation unit 109 according to the flowchart of FIG. 7 with respect to step S205 of the side effect similarity calculation processing of FIG. Same as processing. Therefore, a detailed description of the score calculation process is omitted here, but one example is as follows.

例えば、学習対象薬ＩＤが「１１１２２２Ａ３３３３」で選択薬ＩＤが「９９８８７７Ｆ５０５０」の場合、図４の学習結果テーブル２０４からキーワードＷ_１として「気管支炎」が選択され、キーワードＷ_２として「急性上気道炎」が選択されることがある。そして、「気管支炎」と「急性上気道炎」という組み合わせに対して、キーワード類似度評価部１０９は、図７に示した点数計算処理により、「気」と「炎」という部分文字列の一致の結果として４（＝２＋２）点を得て、類薬判定・学習部１１２に４点という点数を返す。 For example, when the learning target drug ID is “111222A3333” and the selected drug ID is “998877F5050”, “bronchitis” is selected as the keyword W ₁ from the learning result table 204 of FIG. 4, and “acute upper respiratory tract inflammation” is selected as the keyword W _2. May be selected. For the combination of “bronchitis” and “acute upper respiratory tract inflammation”, the keyword similarity evaluation unit 109 uses the score calculation process shown in FIG. As a result, 4 (= 2 + 2) points are obtained, and a score of 4 points is returned to the analogy drug determination / learning unit 112.

ステップＳ７０６でキーワード類似度評価部１０９にキーワードＷ_１とＷ_２の組み合わせに対応する点数を計算させた後、類薬判定・学習部１１２は、ステップＳ７０７において、キーワード類似度評価部１０９が計算した点数が所定の基準を満たすか否かを判断する。具体的には、図８の基準値情報５０２において、キーワードＷ_１とＷ_２のうち短い方の長さに対応づけられている基準値以上の点数がステップＳ７０６で得られた場合、類薬判定・学習部１１２は、「所定の基準が満たされた」と判断する。例えば、｜Ｗ_１｜＝１０かつ｜Ｗ_２｜＝８のとき、類薬判定・学習部１１２は、基準値情報５０２において８バイトという長さに対応づけられている基準値（１０×β）以上の点数が得られたか否かを判断する。 After causing the keyword similarity evaluation unit 109 to calculate the score corresponding to the combination of the keywords W ₁ and W ₂ in step S706, the analog similarity determination / learning unit 112 calculates the score in the keyword similarity evaluation unit 109 in step S707. It is determined whether or not the score satisfies a predetermined standard. Specifically, in the reference value information 502 of FIG. 8, if the shorter reference value or more points that are associated with the length of the keyword W ₁ and W ₂ are obtained in step S706, the similar drugs determination The learning unit 112 determines that “predetermined criteria are satisfied”. For example, when | W ₁ | = 10 and | W ₂ | = 8, the analogy drug determination / learning unit 112 has a reference value (10 × β) associated with a length of 8 bytes in the reference value information 502 It is determined whether or not the above score has been obtained.

点数が基準値以上の場合、キーワードＷ_１とＷ_２の類似度は「一致する」と見なしてよい基準に達しているので、処理はステップＳ７０８に移行する。他方、点数が基準値未満の場合、キーワードＷ_１とＷ_２は「一致する」と見なしてよい基準に達していないので、処理はステップＳ７０４に戻る。 If the score is equal to or greater than the reference value, the similarity between the keywords W ₁ and W ₂ has reached a criterion that can be regarded as “match”, and the process moves to step S708. On the other hand, if the score is less than the reference value, the keywords W ₁ and W ₂ have not reached a criterion that can be regarded as “match”, and the process returns to step S704.

ステップＳ７０８で類薬判定・学習部１１２は、キーワードＷ_１とＷ_２の対を同義語として学習する。すなわち、類薬判定・学習部１１２は、キーワードＷ_１とＷ_２を対にしたエントリを同義語辞書２０３に追加する。 Similar drugs determination and learning unit 112 in step S708, the learning pairs keyword _{W 1} and _{W 2} as synonyms. That is, the analogy drug determination / learning unit 112 adds an entry in which the keywords W ₁ and W ₂ are paired to the synonym dictionary 203.

そして、次のステップＳ７０９で類薬判定・学習部１１２は、ステップＳ７０６で得た点数が、互いに一致すると見なせるキーワードの組み合わせに関して今までに得られた点数の中で上位３位以内に入る点数か否かを判断する。 Then, in the next step S709, the analogy drug determination / learning unit 112 determines whether the score obtained in step S706 falls within the top three of the scores obtained so far for the keyword combinations that can be regarded as matching each other. Judge whether or not.

具体的には、類薬判定・学習部１１２は、変数Ｐ_１〜Ｐ_３のうち１つでも初期状態のＮＵＬＬのままのものがあれば、「ステップＳ７０６で得た点数は上位３位以内」と判断する。また、変数Ｐ_１〜Ｐ_３にすべて具体的な値が設定済みの場合、類薬判定・学習部１１２は、変数Ｐ_３の値（つまり３位の点数）よりステップＳ７０６で得た点数が大きければ、「ステップＳ７０６で得た点数は上位３位以内」と判断する。 Specifically, if any one of the variables P _{1 to} P ₃ remains as NULL in the initial state, the analogy medicine determination / learning unit 112 “the score obtained in step S706 is within the top three” Judge. If specific values have already been set for all of the variables P _{1 to} P ₃ , the analogy drug determination / learning unit 112 has a larger score in step S706 than the value of the variable P ₃ (that is, the third-ranked score). For example, it is determined that “the score obtained in step S706 is within the top three”.

逆に、変数Ｐ_１〜Ｐ_３にすべて具体的な値が設定済みで、かつステップＳ７０６で得た点数が変数Ｐ_３の値以下であれば、類薬判定・学習部１１２は、「ステップＳ７０６で得た点数は上位３位以内ではない」と判断する。 Conversely, if specific values have already been set for all of the variables P _{1 to} P ₃ and the score obtained in step S706 is equal to or less than the value of the variable P ₃ , the analogy medicine determination / learning unit 112 determines that “step S706 The score obtained in is not within the top three. "

そして、類薬判定・学習部１１２が「ステップＳ７０６で得た点数は上位３位以内」と判断した場合、処理はステップＳ７１０に移行し、それ以外の場合、処理はステップＳ７０４に戻る。 If the similar drug determination / learning unit 112 determines that “the score obtained in step S706 is within the top three”, the process proceeds to step S710, and otherwise, the process returns to step S704.

ステップＳ７１０で類薬判定・学習部１１２は、ステップＳ７０６で得た点数に応じて、適宜変数Ｐ_１〜Ｐ_３を更新する。
具体的には、変数Ｐ_１がＮＵＬＬの場合、類薬判定・学習部１１２は、ステップＳ７０６で得た点数を変数Ｐ_１に代入する。また、変数Ｐ_１がＮＵＬＬではなく、変数Ｐ_２がＮＵＬＬの場合、類薬判定・学習部１１２は、ステップＳ７０６で得た点数を変数Ｐ_２に代入する。そして、変数Ｐ_１とＰ_２がＮＵＬＬではなく、変数Ｐ_３がＮＵＬＬの場合、類薬判定・学習部１１２は、ステップＳ７０６で得た点数を変数Ｐ_３に代入する。 In step S710, the similar drug determination / learning unit 112 appropriately updates the variables P _{1 to} P ₃ according to the score obtained in step S706.
Specifically, if the variable _{P 1} is NULL, similar drugs determination and learning unit 112 assigns a score obtained in step S706 to the variable _{P 1.} Further, when the variable P ₁ is not NULL and the variable P ₂ is NULL, the analog medicine determination / learning unit 112 substitutes the score obtained in step S706 for the variable P ₂ . Then, when the variables P ₁ and P ₂ are not NULL and the variable P ₃ is NULL, the analog medicine determination / learning unit 112 substitutes the score obtained in step S706 for the variable P ₃ .

他方、変数Ｐ_１〜Ｐ_３のすべてに具体的な値が設定されている場合、類薬判定・学習部１１２は次のように変数の更新を行う。
すなわち、ステップＳ７０６で得た点数が変数Ｐ_１の値より大きい場合、類薬判定・学習部１１２は、変数Ｐ_３に現在の変数Ｐ_２の値を代入し、変数Ｐ_２に現在の変数Ｐ_１の値を代入し、変数Ｐ_１にステップＳ７０６で得た点数を代入する。あるいは、ステップＳ７０６で得た点数が変数Ｐ_１の値以下で、かつ変数Ｐ_２の値より大きい場合、類薬判定・学習部１１２は、変数Ｐ_３に現在の変数Ｐ_２の値を代入し、変数Ｐ_２にステップＳ７０６で得た点数を代入する。あるいは、ステップＳ７０６で得た点数が変数Ｐ_２の値以下で、かつ変数Ｐ_３の値より大きい場合、類薬判定・学習部１１２は変数Ｐ_３にステップＳ７０６で得た点数を代入する。 On the other hand, when specific values are set for all of the variables P _{1 to} P _3, the analogy drug determination / learning unit 112 updates the variables as follows.
That is, if the number obtained in step S706 is greater than the value of the variable _{P 1,} similar drugs determination and learning unit 112 substitutes the current value of the variable _{P 2} into the variable _{P 3,} the variable _{P 2} of the current variable P substituting a value of _1, assigning the score obtained in step S706 to the variable _{P 1.} Alternatively, a number of points is equal to or less than the value of the variable P ₁ in step S706, the and the value is greater than the variable P _2, similar drugs determination and learning unit 112 substitutes the current value of the variable P ₂ into the variable P ₃ substitutes score obtained in step S706 to the variable _{P 2.} Alternatively, a number of points is equal to or less than the value of the variable _{P 2} in step S706, and if greater than the value variable _{P 3,} similar drugs determination and learning unit 112 assigns a score obtained in step S706 to the variable _{P 3.}

以上のようにして変数Ｐ_１〜Ｐ_３の更新が終了すると、処理はステップＳ７０４に戻る。
また、ステップＳ７１１で類薬判定・学習部１１２は、変数Ｐ_１とＰ_２とＰ_３を引数として用いて点数正規化処理を行う。ステップＳ７１１における点数正規化処理は、処理を行う主体が類薬判定・学習部１１２であるという点以外は、図６のステップＳ２１０において副作用判定・学習部１０８が図１０のフローチャートにしたがって行う点数正規化処理と同様である。よって、ここでは詳しい説明を省略する。 When the updating of the variables P _{1 to} P ₃ is completed as described above, the process returns to step S704.
Also, similar drugs determination and learning unit 112 in step S711 performs the score normalization process using the variables _{P 1} and _{P 2} and _{P 3} as arguments. The point normalization process in step S711 is the point normalization performed by the side effect determination / learning unit 108 according to the flowchart of FIG. 10 in step S210 of FIG. This is the same as the conversion process. Therefore, detailed description is omitted here.

そして、次のステップＳ７１２で類薬判定・学習部１１２は、ステップＳ７１１で正規化した点数を図１３の処理の戻り値として返し、図１３の処理は終了する。つまり、図１３の処理に相当する図１２のステップＳ６１３において、類薬判定・学習部１１２は、上記の正規化した点数を類似度として取得する。 In step S712, the analogy drug determination / learning unit 112 returns the score normalized in step S711 as a return value of the process in FIG. 13, and the process in FIG. 13 ends. That is, in step S613 in FIG. 12 corresponding to the process in FIG. 13, the analogy drug determination / learning unit 112 acquires the normalized score as the similarity.

以上、図１１〜１３を参照して説明した上記（ｃ４）の前処理によれば、学習結果テーブル２０４の「既知副作用リスト」フィールド以外のフィールドが予め学習される。したがって、ある特定の自社薬について複数の医療機関から安全性情報報告文書２０５が寄せられるとしても、類薬処理部１０３は当該特定の自社薬の類薬の学習を前処理において１回行うだけでよい。 As described above, according to the preprocessing (c4) described above with reference to FIGS. 11 to 13, fields other than the “known side effect list” field of the learning result table 204 are learned in advance. Therefore, even if safety information report documents 205 are received from a plurality of medical institutions for a specific in-house drug, the similar drug processing unit 103 only needs to learn the similar drug for the specific in-house drug once in the pre-processing. Good.

また、図１３の効能・効果類似度算出処理には、次の（ｅ１）と（ｅ２）の特徴がある。
（ｅ１）図１３に例示したアルゴリズムは、ＴＦ・ＩＤＦ値を用いてベクトル空間モデルにしたがって文書間の類似度を算出する一般的なアルゴリズムとは異なり、効能又は効果に基づいて類薬か否かを判定するのに適するように工夫されたものである。つまり、図１３に示したアルゴリズムは、医薬品には複数の効能又は効果がある場合が珍しくないことと、一部の効能又は効果が高い類似度を示している医薬品同士は類薬と見なせることを利用して、類薬の判定に適するよう工夫されている。 Further, the effect / effect similarity calculation process of FIG. 13 has the following characteristics (e1) and (e2).
(E1) The algorithm illustrated in FIG. 13 is different from a general algorithm that calculates similarity between documents according to a vector space model using TF / IDF values, and whether or not it is a similar drug based on efficacy or effect. It is devised so that it is suitable for judging. That is, the algorithm shown in FIG. 13 indicates that it is not uncommon for a medicine to have a plurality of effects or effects, and that medicines showing a high degree of similarity with some effects or effects can be regarded as similar drugs. It has been devised to be suitable for the determination of similar drugs.

例えば、第１の医薬品は疾病ＸとＹに効果があり、第２の医薬品は疾病ＹとＺに効果があるかもしれない。この場合、第１と第２の医薬品の効果は、全体としては必ずしも類似性が高いわけではない。よって、ＴＦ・ＩＤＦ値を用いてベクトル空間モデルにしたがって文書間の類似度を算出する一般的なアルゴリズムによれば、「第１と第２の医薬品が類似する」という結果が得られるとは限らない。 For example, a first drug may be effective for diseases X and Y, and a second drug may be effective for diseases Y and Z. In this case, the effects of the first and second drugs are not necessarily highly similar as a whole. Therefore, according to a general algorithm that calculates similarity between documents according to a vector space model using TF / IDF values, a result that “the first and second drugs are similar” is not always obtained. Absent.

他方で、疾病Ｙに効果があるという点で、第１と第２の医薬品は類薬と見なせるが、図１３の処理によれば、疾病Ｙに効果があるという記載同士の類似性から、類薬判定・学習部１１２は「第１と第２の医薬品は類薬同士である」という結論を得ることができる。その理由は、以下のとおりである。 On the other hand, the first and second medicines can be regarded as similar drugs in that they are effective against disease Y. However, according to the processing of FIG. The medicine determination / learning unit 112 can obtain a conclusion that “the first and second medicines are similar medicines”. The reason is as follows.

第１と第２の医薬品に対応する効能・効果キーワード群はそれぞれ疾病Ｙに関するキーワードを含む。よって、本実施形態によれば、疾病Ｙに関するキーワード同士の組み合わせに関して図１３のステップＳ７０７で「一致の基準を満たす」と判断される。したがって、疾病Ｙに関するキーワード同士の類似度を示す点数を使った点数正規化処理が行われる。よって、図１２に示した閾値γ_１が適切に設定されていれば、類薬判定・学習部１１２は「第１と第２の医薬品は類薬同士である」と判断することができる。 The efficacy / effect keyword groups corresponding to the first and second pharmaceutical products each include a keyword related to the disease Y. Therefore, according to this embodiment, regarding the combination of keywords related to the disease Y, it is determined in step S707 in FIG. Therefore, the score normalization process using the score indicating the similarity between the keywords related to the disease Y is performed. Therefore, if the thresholded gamma ₁ is appropriately set as shown in FIG. 12, similar drugs determination and learning unit 112 can determine that "the first and second pharmaceutical is between similar drugs."

なお、実施形態によっては、類薬判定・学習部１１２はＴＦ・ＩＤＦ値を補助的に用いてもよい。つまり、類薬判定・学習部１１２は、ＴＦ・ＩＤＦ値を使って学習対象薬と選択薬それぞれの添付文書２０２の「効能又は効果」セクションの特徴ベクトルを求め、特徴ベクトル同士の近さ（例えば特徴ベクトル同士のなす角）を計算してもよい。そして、類薬判定・学習部１１２は、特徴ベクトル同士の近さと図１３の処理によって得た類似度の双方に基づいて、図１２のステップＳ６１４において学習対象薬と選択薬が類薬同士か否かを判断してもよい。 Depending on the embodiment, the analogy drug determination / learning unit 112 may use the TF / IDF value as an auxiliary. That is, the similar drug determination / learning unit 112 obtains the feature vector of the “efficacy or effect” section of the attached document 202 of each of the learning target drug and the selected drug using the TF / IDF value, and the proximity of the feature vectors (for example, The angle between feature vectors may be calculated. Then, based on both the proximity of the feature vectors and the similarity obtained by the processing of FIG. 13, the similar medicine determination / learning unit 112 determines whether or not the learning target medicine and the selected medicine are similar medicines in step S614 of FIG. 12. It may be judged.

（ｅ２）図１３の処理では、長いキーワードの一致ほど重視される。つまり、図７の点数計算処理から明らかなとおり、長いキーワード同士の一致ほど自然に点数も高くなるが、図１３のステップＳ７０９においてはキーワードの長さによる違いは考慮されず、単に点数の大きさのみが判断の基準に使われる。よって、長いキーワード同士で一致する組み合わせがある場合、短いキーワード同士の一致はステップＳ７１１以降の処理にまったく影響しない場合もある。 (E2) In the process shown in FIG. That is, as apparent from the score calculation process of FIG. 7, the longer the match between the keywords, the higher the score naturally. However, in step S709 of FIG. Only is used as a criterion for judgment. Therefore, when there is a matching combination between long keywords, the matching between short keywords may not affect the processing after step S711 at all.

例えば、学習対象薬と選択薬の間で完全に一致する１０バイトのキーワードが３組あるとする。すると、そのほかに完全に一致する４バイトのキーワードの組がいくつあっても、それらの４バイトのキーワードの一致によって得られた点数は、ステップＳ７１１以下では考慮されない。したがって、図１３の処理で得られた類似度を使って類薬判定・学習部１１２が図１２のステップＳ６１４で行う判断に対しても、上記の４バイトのキーワード間の一致は何の影響も及ぼさない。 For example, it is assumed that there are three sets of 10-byte keywords that completely match between the learning target drug and the selected drug. Then, no matter how many other 4-byte keyword pairs are completely matched, the score obtained by matching these 4-byte keywords is not considered in step S711 and subsequent steps. Therefore, even if the similarity determination obtained by the processing of FIG. 13 uses the similarity determination / learning unit 112 in step S614 of FIG. Does not reach.

このように本実施形態では、長いキーワード同士の一致が、短いキーワード同士の一致よりも重要視され、優先的に考慮される。この点は、以下の理由から、類薬の判定に適した特徴であると言える。 As described above, in this embodiment, matching between long keywords is more important than matching between short keywords, and is considered with priority. This point can be said to be a feature suitable for determination of analogs for the following reasons.

一般的な傾向として、長いキーワードほど個別具体的な内容を示すことが多い。よって、学習対象薬と選択薬の間で長いキーワードが一致していれば、限定された個別具体的な疾病ないし症状に対して学習対象薬と選択薬が同種の効果を持っている蓋然性も高い。つまり、長いキーワードが一致していれば、学習対象薬と選択薬は類薬である蓋然性も高い。よって、本実施形態では、蓋然性の高さを反映して長いキーワードの一致ほど重視するように、図１３のステップＳ７０９ではあえてキーワードの長さによる正規化などは行わず、単純に点数の大きさのみが判断の基準として使われる。 As a general tendency, longer keywords often show specific details. Therefore, if the long keywords match between the learning target drug and the selected drug, there is a high probability that the learning target drug and the selected drug have the same kind of effect on limited individual specific diseases or symptoms. . That is, if long keywords match, the learning target drug and the selected drug are highly likely to be similar drugs. Therefore, in the present embodiment, in order to emphasize the matching of long keywords reflecting the high probability, in step S709 of FIG. Only is used as a criterion for judgment.

続いて、添付文書２０２の追加又は更新にともなって判定装置１００が行う上記（ｃ５）の前処理について、図１４を参照して説明する。
図１４は、追加・更新処理のフローチャートである。なお、本実施形態では、追加・更新処理の開始を指示する入力を契機として前処理制御部１１３が追加・更新処理を開始する。また、当該入力は、以下の（ｆ１）と（ｆ２）の情報の指定も含む。 Next, the preprocessing (c5) performed by the determination apparatus 100 in accordance with the addition or update of the attached document 202 will be described with reference to FIG.
FIG. 14 is a flowchart of the addition / update process. In the present embodiment, the preprocessing control unit 113 starts the addition / update process in response to an input instructing the start of the addition / update process. The input also includes designation of the following information (f1) and (f2).

（ｆ１）添付文書２０２の追加又は更新の対象となる医薬品（以下「対象薬」という）のＩＤ（以下「対象薬ＩＤ」という）。
（ｆ２）追加又は更新される新規添付文書を特定する新規添付文書特定情報。例えば、新規添付文書のファイルが既に記憶装置３０７上に作成されている場合は、新規添付文書特定情報は、当該ファイルのパスでもよい。あるいは、他のコンピュータ３１２上に新規添付文書のファイルがある場合は、新規添付文書特定情報は、Uniform Resource Identifier（ＵＲＩ）でもよい。 (F1) ID (hereinafter referred to as “target drug ID”) of a pharmaceutical product (hereinafter referred to as “target drug”) to be added or updated in the attached document 202.
(F2) New attached document specifying information for specifying a new attached document to be added or updated. For example, when a file of a new attached document has already been created on the storage device 307, the new attached document specifying information may be the path of the file. Alternatively, when there is a file of a new attached document on another computer 312, the new attached document specifying information may be a Uniform Resource Identifier (URI).

さて、入力装置３０５を介して対象薬ＩＤと新規添付文書特定情報が入力され、追加・更新処理の開始が指示されると、前処理制御部１１３は追加・更新処理を開始する。そして、ステップＳ８０１で前処理制御部１１３は、入力された対象薬ＩＤを検索キーとして学習結果テーブル２０４を検索し、入力された対象薬ＩＤをＩＤとして持つエントリが学習結果テーブル２０４にあるか否かを確認する。 When the target drug ID and the new attached document specifying information are input via the input device 305 and an instruction to start the addition / update process is given, the preprocessing control unit 113 starts the addition / update process. In step S801, the preprocessing control unit 113 searches the learning result table 204 using the input target drug ID as a search key, and whether or not there is an entry having the input target drug ID as an ID in the learning result table 204. To check.

入力された対象薬ＩＤをＩＤとして持つエントリが学習結果テーブル２０４にある場合は、登録済みの添付文書２０２の更新のために追加・更新処理の開始が指示されたということなので、処理はステップＳ８０２に移行する。他方、入力された対象薬ＩＤをＩＤとして持つエントリが学習結果テーブル２０４にない場合は、新たな医薬品についての添付文書２０２の追加のために追加・更新処理の開始が指示されたということなので、処理はステップＳ８０３に移行する。 If there is an entry having the input target drug ID as an ID in the learning result table 204, it means that the start of the addition / update process has been instructed to update the registered attached document 202, so the process is step S802. Migrate to On the other hand, if there is no entry having the inputted target drug ID as an ID in the learning result table 204, it means that the start of the addition / update process has been instructed to add the attached document 202 for the new medicine. The processing moves to step S803.

ステップＳ８０２で前処理制御部１１３は、対象薬の添付文書２０２を、新規添付文書特定情報により特定される新規添付文書で置換する。そして、処理はステップＳ８０５に移行する。 In step S802, the preprocessing control unit 113 replaces the attached document 202 of the target drug with a new attached document specified by the new attached document specifying information. Then, the process proceeds to step S805.

また、ステップＳ８０３で前処理制御部１１３は、新規添付文書特定情報により特定される新規添付文書を、対象薬の添付文書２０２として添付文書群２０１に追加する。そして、処理はステップＳ８０４に移行する。なお、本実施形態では上述のごとく、添付文書２０２のファイル名が医薬品のＩＤを含むので、ステップＳ８０２とＳ８０３で前処理制御部１１３は適宜ファイル名の付け替えも行う。 In step S803, the preprocessing control unit 113 adds the new attached document specified by the new attached document specifying information to the attached document group 201 as the attached document 202 of the target drug. Then, the process proceeds to step S804. In the present embodiment, as described above, since the file name of the attached document 202 includes the medicine ID, the preprocessing control unit 113 appropriately changes the file name in steps S802 and S803.

ステップＳ８０４で前処理制御部１１３は、対象薬についてのエントリを学習結果テーブル２０４に追加する。つまり、前処理制御部１１３は、入力装置３０５を介して指定された対象薬ＩＤを「ＩＤ」フィールドに設定し、かつ他のフィールドを空に初期化したエントリを、学習結果テーブル２０４に追加する。そして、処理はステップＳ８０５に移行する。 In step S804, the preprocessing control unit 113 adds an entry for the target drug to the learning result table 204. That is, the preprocessing control unit 113 sets the target drug ID designated via the input device 305 in the “ID” field and adds an entry in which the other fields are initialized to empty to the learning result table 204. . Then, the process proceeds to step S805.

ステップＳ８０５で前処理制御部１１３は、効能・効果キーワード抽出部１１１に対して、対象薬の添付文書２０２の「効能又は効果」セクションからキーワードを抽出して学習結果テーブル２０４に登録するよう命令する。そして、効能・効果キーワード抽出部１１１は命令にしたがってキーワード抽出と学習結果テーブル２０４への登録を行う。なお、ステップＳ８０５は、図１１のステップＳ５０４と類似の処理であり、違いはキーワード抽出の対象がどの医薬品の添付文書２０２かという点だけなので、詳細な説明は割愛する。 In step S805, the preprocessing control unit 113 instructs the efficacy / effect keyword extraction unit 111 to extract a keyword from the “efficacy or effect” section of the attached document 202 of the target drug and register it in the learning result table 204. . Then, the efficacy / effect keyword extraction unit 111 performs keyword extraction and registration in the learning result table 204 according to the command. Note that step S805 is similar to step S504 in FIG. 11, and the only difference is in which medicine attached document 202 is the target of keyword extraction, and detailed description thereof is omitted.

そして、次のステップＳ８０６で前処理制御部１１３は、副作用キーワード抽出部１０７に対して、対象薬の添付文書２０２の「副作用」セクションからキーワードを抽出して学習結果テーブル２０４に登録するよう命令する。そして、副作用キーワード抽出部１０７は命令にしたがってキーワード抽出と学習結果テーブル２０４への登録を行う。なお、ステップＳ８０６は、図１１のステップＳ５０５と類似の処理であり、違いはキーワード抽出の対象がどの医薬品の添付文書２０２かという点だけなので、詳細な説明は割愛する。また、ステップＳ８０５とＳ８０６の実行順序は逆でもよい。 In step S806, the preprocessing control unit 113 instructs the side effect keyword extraction unit 107 to extract a keyword from the “side effect” section of the attached document 202 of the target drug and register it in the learning result table 204. . Then, the side effect keyword extraction unit 107 performs keyword extraction and registration in the learning result table 204 according to the command. Note that step S806 is similar to step S505 in FIG. 11, and the only difference is in which medicine attached document 202 is the target of keyword extraction, and detailed description thereof is omitted. Further, the execution order of steps S805 and S806 may be reversed.

さらに、次のステップＳ８０７で前処理制御部１１３は、添付文書２０２が登録されている医薬品の数を求め、求めた数を変数Ｎに代入して記憶する。
そして、次のステップＳ８０８で前処理制御部１１３は、類薬判定・学習部１１２に対して対象薬の類薬を学習するよう命令し、類薬判定・学習部１１２は対象薬の類薬を学習する。すなわち、ステップＳ８０８で類薬判定・学習部１１２が行う処理は図１２に示した類薬学習処理である。また、ステップＳ８０８で前処理制御部１１３は、追加・更新処理の開始時に指定された対象薬ＩＤを、学習対象薬ＩＤとして類薬判定・学習部１１２に通知するとともに、学習対象薬との比較範囲を１番目からＮ番目と指定する。 Further, in the next step S807, the preprocessing control unit 113 calculates the number of medicines for which the attached document 202 is registered, and stores the calculated number in a variable N.
In the next step S808, the preprocessing control unit 113 instructs the analogy drug determination / learning unit 112 to learn the analogy of the target drug, and the analogy drug determination / learning unit 112 selects the analogy of the target drug. learn. That is, the process performed by the similar drug determination / learning unit 112 in step S808 is the similar drug learning process shown in FIG. In step S808, the preprocessing control unit 113 notifies the target drug ID designated at the start of the addition / update process to the analogy drug determination / learning unit 112 as the learning target drug ID and compares it with the learning target drug. Designate the range from 1st to Nth.

なお、図１２のステップＳ６０１に示したように、類薬判定・学習部１１２は学習結果テーブル２０４に学習済みの類薬リストを使って類薬学習リストを初期化する。よって、添付文書２０２の更新のために図１４の追加・更新処理が行われる場合、対象薬に関して学習結果テーブル２０４に学習済みの、対象薬の類薬リストの内容は消えない。 Note that, as shown in step S601 of FIG. 12, the similar drug determination / learning unit 112 initializes the similar drug learning list using the learned drug list in the learning result table 204. Therefore, when the addition / update process of FIG. 14 is performed to update the attached document 202, the contents of the target drug analogy drug list already learned in the learning result table 204 regarding the target drug are not deleted.

以上のとおり、本実施形態によれば、類薬処理部１０３は、判定対象薬ＩＤ２０６で特定される判定対象薬に関して、予め類薬リストの学習を行う。前処理の段階ではどの医薬品が判定対象薬かは決まってはいないが、判定対象薬に注目した観点から前処理を説明しなおせば下記のとおりである。 As described above, according to the present embodiment, the analog medicine processing unit 103 learns the analog medicine list in advance regarding the determination target drug specified by the determination target drug ID 206. It is not determined which drug is the determination target drug at the pre-processing stage, but the pre-processing will be described again from the viewpoint of paying attention to the determination target drug.

すなわち、効能・効果キーワード抽出部１１１は、判定対象薬及び判定対象薬以外の他の医薬品のそれぞれの添付文書２０２における効能又は効果の記載部分に対して語句抽出処理を行って効能効果語句集合を取得する効能効果語句抽出手段の一例である。本実施形態では、取得された効能効果語句集合は、学習結果テーブル２０４の「効能・効果キーワード群」フィールドに格納される。 That is, the efficacy / effect keyword extraction unit 111 performs a phrase extraction process on the description part of the efficacy or effect in the attached document 202 of each of the determination target drug and other drugs other than the determination target drug to generate an effect / effect phrase set. It is an example of the effect effect phrase extraction means to acquire. In the present embodiment, the acquired effect / effect phrase set is stored in the “effect / effect keyword group” field of the learning result table 204.

また、類薬判定・学習部１１２は、判定対象薬以外の他の医薬品の少なくとも一部について、それぞれ、判定対象薬の類薬か否かの判定を行う第１の類薬判定手段の一例である。第１の類薬判定手段としての類薬判定・学習部１１２は、他の医薬品に関して取得された効能効果語句集合に含まれる語句と、判定対象薬に関して取得された効能効果語句集合に含まれる語句との組み合わせを、キーワード類似度評価部１０９に評価させる。 The similar drug determination / learning unit 112 is an example of a first similar drug determination unit that determines whether or not the determination target drug is similar to at least a part of the drug other than the determination target drug. is there. The analogy drug determination / learning unit 112 as the first analogy drug determination unit includes a phrase included in the efficacy effect phrase set acquired for the other drug and a phrase included in the efficacy effect phrase set acquired for the determination target drug And the keyword similarity evaluation unit 109 evaluates the combination.

さらに、第１の類薬判定手段としての類薬判定・学習部１１２は、複数の組み合わせについての評価を集計することで、判定対象薬と、選択薬として注目している当該他の医薬品との間の効能又は効果の類似度を示す値を算出する。つまり、類薬判定・学習部１１２は、キーワード類似度評価部１０９による評価の集計として、具体的には、一致の基準を満たす語句のうちで上位３位までの点数を選び出して正規化する処理を行う。 Furthermore, the similar drug determination / learning unit 112 as the first similar drug determination unit aggregates evaluations regarding a plurality of combinations, so that the determination target drug and the other drug focused on as the selected drug A value indicating the efficacy or similarity between effects is calculated. In other words, the analogy medicine determination / learning unit 112 selects and normalizes the top three rankings of words that satisfy the matching criteria, specifically, as a summary of evaluation by the keyword similarity evaluation unit 109. I do.

そして、第１の類薬判定手段としての類薬判定・学習部１１２は、上記のように集計によって算出した、類似度を示す値を、閾値γ_１と比較する。そして、類薬判定・学習部１１２は、算出した値の示す類似度が閾値γ_１の示す類似度よりも高いとき、当該他の医薬品を判定対象薬の類薬と判定する。よって、類薬リストを参照して判定対象薬の類薬を認識する類薬認識部１０５は、第１の類薬判定手段としての類薬判定・学習部１１２の判定結果にしたがって類薬を認識していると言える。 Then, the analog drug determination / learning unit 112 as the first analog drug determination unit compares the value indicating the similarity calculated by aggregation as described above with the threshold γ ₁ . Then, similar drugs determination and learning unit 112, when the degree of similarity indicated calculated value is higher than the similarity degree indicating the threshold value gamma _1, determines the other drugs and determination target drug similar drugs. Therefore, the analog drug recognition unit 105 that recognizes the analog of the determination target drug with reference to the analog drug list recognizes the analog according to the determination result of the analog drug determination / learning unit 112 as the first analog drug determination unit. I can say that.

また、第１の類薬判定手段としての類薬判定・学習部１１２による判定は、判定対象薬以外の他の医薬品の少なくとも一部について行われると説明したが、「少なくとも一部」という意味は次のとおりである。 In addition, it has been described that the determination by the similar drug determination / learning unit 112 as the first similar drug determination unit is performed for at least a part of the pharmaceutical other than the determination target drug. However, the meaning of “at least a part” means It is as follows.

すなわち、第１の類薬判定手段としての判定は、本実施形態では図１２のステップＳ６１３とＳ６１４において行われる。そして、ステップＳ６１３は、必ずしもすべての医薬品について実行されるわけではない。 In other words, the determination as the first analog determination unit is performed in steps S613 and S614 in FIG. And step S613 is not necessarily performed about all the pharmaceutical products.

つまり、類薬判定・学習部１１２は、判定対象薬以外の複数の他の医薬品のうち、添付文書２０２に記載されている薬効分類名、基準名、一般名、化学名又は構造式が判定対象薬と一致する医薬品を類薬と判定する第２の類薬判定手段としての機能も実現する。そして、類薬判定・学習部１１２は、第２の類薬判定手段として図１２のステップＳ６０７〜Ｓ６１２の処理を行い、第２の類薬判定手段としては類薬と判定しなかった医薬品についてのみ、第１の類薬判定手段としてステップＳ６１３〜Ｓ６１４の処理を行う。 That is, the medicinal product determination / learning unit 112 determines whether the medicinal property classification name, the reference name, the common name, the chemical name, or the structural formula described in the attached document 202 among the plurality of other drugs other than the determination target drug is to be determined. A function as a second similar drug determination unit that determines a drug matching the drug as a similar drug is also realized. Then, the similar drug determination / learning unit 112 performs the processing of steps S607 to S612 in FIG. 12 as the second similar drug determination unit, and only the drug that has not been determined as the similar drug as the second similar drug determination unit. Then, the processing of steps S613 to S614 is performed as a first analogy drug determination unit.

そのため、第１の類薬判定手段としての類薬判定・学習部１１２が判定を行う対象は、判定対象薬以外の他の医薬品のすべてとは限らない。よって、上記の説明では「少なくとも一部」と述べた。このように、ステップＳ６０７〜Ｓ６１２の処理と比べて複雑なステップＳ６１３〜Ｓ６１４の処理を行う対象を一部の類薬に限ることで、無駄な処理負荷を減らすことができる。 Therefore, the target for determination by the similar drug determination / learning unit 112 as the first similar drug determination means is not necessarily all of the pharmaceuticals other than the determination target drug. Therefore, in the above description, “at least part” is described. As described above, by limiting the target for performing the processing of steps S613 to S614, which is more complicated than the processing of steps S607 to S612, to some analogs, it is possible to reduce a wasteful processing load.

ところで、本発明は上記実施形態に限られるものではない。上記の説明においてもいくつかの変形について説明したが、上記実施形態は、さらに例えば下記（ｇ１）〜（ｇ８）の観点から様々に変形することもでき、これらの変形は、相互に矛盾しない限り、任意に組み合わせることが可能である。 By the way, the present invention is not limited to the above embodiment. Although some modifications have been described in the above description, the above embodiment can be further modified variously from the viewpoints of (g1) to (g8) below, for example, as long as these modifications do not contradict each other. Any combination is possible.

（ｇ１）図７の点数計算処理に関する変形
キーワード抽出のアルゴリズムによっては、キーワード類似度評価部１０９が行う図７の点数計算処理が変形され、また、あわせて学習結果テーブル２０４の効能・効果キーワード群と副作用キーワード群のデータ形式が変形されてもよい。 (G1) Modification Regarding Score Calculation Processing in FIG. 7 Depending on the keyword extraction algorithm, the score calculation processing in FIG. The data format of the side effect keyword group may be modified.

具体的には、副作用キーワード抽出部１０７が形態素解析を利用してキーワード抽出を行う場合、キーワード類似度評価部１０９が図６のステップＳ２０５で行う図７の点数計算処理は、以下のように変形されてもよい。同様に、効能・効果キーワード抽出部１１１が形態素解析を利用してキーワード抽出を行う場合、キーワード類似度評価部１０９が図１３のステップＳ７０６で行う図７の点数計算処理は、以下のように変形されてもよい。 Specifically, when the side effect keyword extraction unit 107 performs keyword extraction using morphological analysis, the score calculation process of FIG. 7 performed by the keyword similarity evaluation unit 109 in step S205 of FIG. 6 is modified as follows. May be. Similarly, when the effect / effect keyword extraction unit 111 performs keyword extraction using morphological analysis, the score calculation process of FIG. 7 performed by the keyword similarity evaluation unit 109 in step S706 of FIG. 13 is modified as follows. May be.

すなわち、図７の点数計算処理は、キーワードを部分文字列に分割する分割位置を形態素区切りの位置に限定するように変形されてもよい。
図７の点数計算処理のアルゴリズムは、キーワードＡとＢの双方について、すべての可能な分割パターンを網羅するように調べ上げる方針にしたがう。よって、図７の点数計算処理では、キーワード内の任意の位置が、部分文字列同士を分割する分割位置になりうる。 That is, the score calculation process of FIG. 7 may be modified so that the division position for dividing the keyword into partial character strings is limited to the morpheme division position.
The score calculation algorithm in FIG. 7 follows a policy of examining all the possible division patterns for both keywords A and B. Therefore, in the score calculation process of FIG. 7, an arbitrary position in the keyword can be a division position for dividing the partial character strings.

つまり、キーワード類似度評価部１０９は、２つの語句同士の類似度を求めるのに、２つの語句について、それぞれの語句内で互いに隣接する任意の２文字の間で分割する分割パターンとして可能な分割パターン同士のすべての組み合わせについての評価を集計する。なお、ここでの「集計」とは、図７の例では具体的には最高の評価を選び出すことに相当する。 That is, the keyword similarity evaluation unit 109 obtains the similarity between two words, and the two words can be divided as any divided pattern that is divided between any two adjacent characters in each word. Aggregate evaluations for all combinations of patterns. Note that “aggregation” here specifically corresponds to selecting the highest evaluation in the example of FIG. 7.

そのため、図７に示した上記実施形態では、キーワード類似度評価部１０９が考慮する分割パターンの組み合わせの数が多く、キーワード類似度評価部１０９の処理負荷が高い。 Therefore, in the embodiment shown in FIG. 7, the number of combinations of division patterns considered by the keyword similarity evaluation unit 109 is large, and the processing load on the keyword similarity evaluation unit 109 is high.

それに対し、キーワードを部分文字列に分割する分割位置を形態素区切りの位置に限定するように変形された点数計算処理においては、キーワードＡとＢそれぞれに関する分割パターンの数が限定されるので、組み合わせの数も少なく抑えられる。よって、キーワード類似度評価部１０９の処理負荷も減る。 On the other hand, in the score calculation process modified so as to limit the division position for dividing the keyword into partial character strings to the morpheme division position, the number of division patterns for each of the keywords A and B is limited. The number can be kept small. Therefore, the processing load of the keyword similarity evaluation unit 109 is also reduced.

例えば、図９に例示した「全身麻酔剤」というキーワード６０１は５文字なので、図７の点数計算処理では、キーワード６０１に対して１６（＝２^５−１）通りの分割パターンが考慮される。 For example, since the keyword 601 “general anesthetic” illustrated in FIG. 9 is five characters, 16 (= 2 ^5-1 ) division patterns are considered for the keyword 601 in the score calculation process of FIG.

他方、キーワード６０１が形態素解析の結果を利用して抽出されたものであるとすると、上記のように変形された点数計算処理においては、キーワード６０１に対して可能な分割パターンの数はごく少数である。 On the other hand, if the keyword 601 is extracted using the result of morphological analysis, in the score calculation process modified as described above, the number of possible division patterns for the keyword 601 is very small. is there.

例えば、「全身麻酔剤」というキーワード６０１は、「全身」という名詞と「麻酔」という名詞と「剤」という名詞の連なりとして、形態素解析の結果から得られたものだとする。すると、キーワード６０１内に形態素区切りの位置は２箇所しかないので、上記のように変形された点数計算処理においては、キーワード６０１に対して可能な分割パターンの数は４（＝２^２）通りしかない。すなわち、この例では、図９に示した３通りの分割パターン６０３ａ〜６０３ｃと、「全身／麻酔剤」という分割パターンという合計４通りのみが可能な分割パターンである。 For example, it is assumed that the keyword 601 “general anesthetic” is obtained from the result of morphological analysis as a series of a noun “whole body”, a noun “anesthetic”, and a noun “agent”. Then, since there are only two morpheme division positions in the keyword 601, in the score calculation process modified as described above, the number of possible division patterns for the keyword 601 is only 4 (= 2 ² ). Absent. In other words, in this example, there are only four patterns in total, that is, the three division patterns 603a to 603c shown in FIG. 9 and the division pattern “whole body / anesthetic”.

したがって、上記のように変形された点数計算処理は、図７の点数計算処理と比べると、キーワード類似度評価部１０９の計算負荷を少なくする効果を奏する。逆に、上記の変形された点数計算処理と比べて図７の点数計算処理の方が優れている点としては、形態素解析用の辞書に登録されていない未知語に対する頑健性（robustness）が挙げられる。 Therefore, the score calculation process modified as described above has an effect of reducing the calculation load of the keyword similarity evaluation unit 109 as compared with the score calculation process of FIG. On the contrary, the point calculation process of FIG. 7 is superior to the modified score calculation process described above in terms of robustness against unknown words that are not registered in the morphological analysis dictionary. It is done.

また、上記のように変形された点数計算処理をキーワード類似度評価部１０９が行えるようにするために、学習結果テーブル２０４が変形されてもよい。例えば、学習結果テーブル２０４は、副作用キーワード抽出部１０７が形態素解析の結果を用いて抽出した各キーワードにおける形態素区切りの位置を示す情報を含むように変形されてもよい。形態素区切りの位置を示す情報は、例えば、学習結果テーブル２０４の新たなフィールドに格納されてもよいし、「副作用キーワード群」フィールド自体に含まれてもよい。 In addition, the learning result table 204 may be modified so that the keyword similarity evaluation unit 109 can perform the score calculation process modified as described above. For example, the learning result table 204 may be modified so as to include information indicating the position of the morpheme segmentation in each keyword extracted by the side effect keyword extraction unit 107 using the result of the morpheme analysis. Information indicating the position of the morpheme break may be stored in a new field of the learning result table 204 or may be included in the “side effect keyword group” field itself, for example.

例えば、副作用キーワード抽出部１０７は、適宜のデリミタ文字を用いたり所定の文法にしたがってマークアップしたりすることで形態素区切りの位置を示したキーワードを、副作用キーワード群の各要素として学習結果テーブル２０４に記録してもよい。あるいは、副作用キーワード抽出部１０７は、副作用キーワード群内の各キーワードに関する形態素区切りの位置を示す情報を、学習結果テーブル２０４の新たなフィールドに記録してもよい。 For example, the side effect keyword extraction unit 107 uses the appropriate delimiter character or marks up according to a predetermined grammar to indicate the keyword indicating the position of the morpheme separation in the learning result table 204 as each element of the side effect keyword group. It may be recorded. Alternatively, the side effect keyword extraction unit 107 may record information indicating the position of the morpheme break for each keyword in the side effect keyword group in a new field of the learning result table 204.

もちろん、効能・効果キーワード抽出部１１１と「効能・効果キーワード群」フィールドに関しても、上記と同様の変形が可能である。
そして、学習結果テーブル２０４が形態素区切りの位置を示す情報を保持していれば、副作用判定・学習部１０８又は類薬判定・学習部１１２は、学習結果テーブル２０４から形態素区切りの位置を示す情報を容易に読み出すことができる。 Of course, the effect / effect keyword extraction unit 111 and the “effect / effect keyword group” field can be modified in the same manner as described above.
If the learning result table 204 holds information indicating the position of the morpheme break, the side effect determination / learning unit 108 or the analogy drug determination / learning unit 112 receives the information indicating the position of the morpheme break from the learning result table 204. It can be read easily.

よって、副作用判定・学習部１０８は図６のステップＳ２０５において、選択キーワードにおける形態素区切りの位置を示す情報をキーワード類似度評価部１０９に通知することができる。 Therefore, the side effect determination / learning unit 108 can notify the keyword similarity evaluation unit 109 of information indicating the position of the morpheme break in the selected keyword in step S205 of FIG.

また、副作用キーワード抽出部１０７は、図３のステップＳ１０４で安全性情報報告文書２０５から形態素解析により選択副作用を抽出している場合には、選択副作用における形態素区切りの位置を示す情報を副作用判定・学習部１０８に通知することもできる。よって、副作用判定・学習部１０８は、副作用キーワード抽出部１０７から通知された情報をキーワード類似度評価部１０９に通知することができ、キーワード類似度評価部１０９は選択副作用における形態素区切りの位置を認識することができる。 Further, if the selected side effect is extracted from the safety information report document 205 by morpheme analysis in step S104 of FIG. 3, the side effect keyword extraction unit 107 uses information indicating the position of the morpheme break in the selected side effect to determine the side effect. The learning unit 108 can also be notified. Therefore, the side effect determination / learning unit 108 can notify the keyword similarity evaluation unit 109 of the information notified from the side effect keyword extraction unit 107, and the keyword similarity evaluation unit 109 recognizes the position of the morpheme break in the selected side effect. can do.

同様に、類薬判定・学習部１１２は図１３のステップＳ７０６において、キーワードＷ_１とＷ_２それぞれにおける形態素区切りの位置を示す情報をキーワード類似度評価部１０９に通知することができる。 Likewise, similar drugs determination and learning unit 112 in step S706 of FIG. 13, it is possible to notify the information indicating the position of the morpheme separated in the keyword W ₁ and W ₂ respectively in the keyword similarity evaluation unit 109.

したがって、キーワード類似度評価部１０９は、上記のように変形された点数計算処理においては、処理対象の２つのキーワードそれぞれにおける形態素区切りの位置を示す情報を認識することができる。すると、上記のとおり、キーワード類似度評価部１０９は、キーワードを部分文字列に分割する分割位置を形態素区切りの位置に限定することで、２つのキーワード間の類似度を示す点数を比較的少ない計算量で計算することができる。 Therefore, in the score calculation process modified as described above, the keyword similarity evaluation unit 109 can recognize information indicating the positions of morpheme breaks in each of the two keywords to be processed. Then, as described above, the keyword similarity evaluation unit 109 calculates a relatively small number of points indicating the degree of similarity between two keywords by limiting the division position at which the keyword is divided into partial character strings to morpheme division positions. Can be calculated in quantity.

また、キーワード類似度評価部１０９は、予め決められた下限より短い部分文字列を含まない分割パターン同士の組み合わせについてのみ、図７のステップＳ３０６〜Ｓ３１４の処理を行ってもよい。例えば、下限が「２文字」あるいは「４バイト」などと決められていてもよい。すると、計算量が削減される効果も得られ、例えば「心臓」を含むキーワードと「腎臓」を含むまったく異なるキーワードとの間で「臓」という１文字の一致に起因してノイズ的に上乗せされる分の点数の影響も排除することができる。 Further, the keyword similarity evaluation unit 109 may perform the processes of steps S306 to S314 in FIG. 7 only for combinations of divided patterns that do not include a partial character string shorter than a predetermined lower limit. For example, the lower limit may be determined as “2 characters” or “4 bytes”. Then, the effect of reducing the amount of calculation is also obtained. For example, a keyword including “heart” and a completely different keyword including “kidney” are added in noise due to a match of one character “vibration”. The influence of the number of points can be eliminated.

（ｇ２）類薬の学習を行うタイミングに関する変形
上記実施形態では、図１２の類薬学習処理が図１１のステップＳ５０９と図１４のステップＳ８０８において行われる。すなわち、類薬は、図３の処理が実行される前に学習される。つまり、上記実施形態では、類薬認識部１０５が、医薬品を一意に識別する識別情報と当該医薬品の類薬とを関連付ける類薬学習結果情報を、格納部１０１の学習結果テーブル２０４から読み出すことにより、判定対象薬の類薬を認識する。 (G2) Modification Regarding Timing of Learning Similar Medicine In the above embodiment, the similar medicine learning process of FIG. 12 is performed in step S509 of FIG. 11 and step S808 of FIG. That is, the similar medicine is learned before the processing of FIG. 3 is executed. That is, in the above-described embodiment, the similar drug recognition unit 105 reads out the similar drug learning result information that associates the identification information for uniquely identifying the drug with the similar drug of the drug from the learning result table 204 of the storage unit 101. , Recognize the similar drugs to be judged.

しかし、実施形態によっては、類薬認識部１０５が類薬処理部１０３を含んでもよく、その場合、副作用処理部１０２が図３の処理を行うときに、類薬認識部１０５内の類薬処理部１０３が図１２の類薬学習処理を行ってもよい。 However, depending on the embodiment, the similar drug recognition unit 105 may include the similar drug processing unit 103. In this case, when the side effect processing unit 102 performs the processing of FIG. 3, the similar drug processing in the similar drug recognition unit 105 is performed. The unit 103 may perform the similar medicine learning process of FIG.

つまり、類薬処理部１０３は、上記実施形態のように事前に類薬学習処理を行う代わりに、副作用処理部１０２が図３の処理を行うときに、その場で類薬学習処理を行ってもよい。具体的には、図３のステップＳ１０２の直前に図１２の類薬学習処理が行われてもよい。 That is, the similar medicine processing unit 103 performs the similar medicine learning process on the spot when the side effect processing part 102 performs the process of FIG. 3 instead of performing the similar medicine learning process in advance as in the above embodiment. Also good. Specifically, the similar medicine learning process of FIG. 12 may be performed immediately before step S102 of FIG.

なお、その場合には、例えば類薬認識部１０５が判定対象薬ＩＤ２０６を学習対象薬ＩＤとして類薬判定・学習部１１２に指定すればよい。また、類薬認識部１０５は、図１４のステップＳ８０８で前処理制御部１１３が類薬判定・学習部１１２に指定するのと同様に、１番目からＮ番目までのすべての添付文書２０２を比較対象の範囲として類薬判定・学習部１１２に指定する。 In this case, for example, the similar drug recognition unit 105 may designate the determination target drug ID 206 as the learning target drug ID to the similar drug determination / learning unit 112. Further, the analogy drug recognition unit 105 compares all the attached documents 202 from the first to the Nth in the same manner as the preprocessing control unit 113 designates to the analogy drug determination / learning unit 112 in step S808 of FIG. It is specified to the analogy drug determination / learning unit 112 as a target range.

そして、類薬判定・学習部１１２が図１２の類薬学習処理を終えると、類薬認識部１０５は、学習された類薬リストをステップＳ１０２において取得することができる。あるいは、類薬判定・学習部１１２は、類薬学習処理の結果を学習結果テーブル２０４に記録する代わりに、類薬認識部１０５に類薬学習処理の結果を直接通知してもよい。つまり、実施形態によっては、学習結果テーブル２０４の類薬リストのフィールドは省略されてもよい。 When the similar drug determination / learning unit 112 finishes the similar drug learning process of FIG. 12, the similar drug recognition unit 105 can acquire the learned drug list in step S102. Alternatively, the similar medicine determination / learning unit 112 may directly notify the similar drug learning unit 105 of the result of the similar drug learning process instead of recording the result of the similar drug learning process in the learning result table 204. That is, depending on the embodiment, the field of the analogy drug list in the learning result table 204 may be omitted.

以上をまとめると、類薬認識手段を実現する類薬認識部１０５が類薬処理部１０３を含み、類薬処理部１０３がステップＳ１０２の直前に類薬学習処理を行う場合、類薬認識部１０５は、複数の医薬品の各々について添付文書２０２を読み出すことで類薬を認識する。すなわち、類薬認識手段の一部としての類薬処理部１０３は、複数の医薬品の各々について添付文書２０２を格納部１０１から読み出し、読み出した添付文書２０２を用いて類薬学習処理を行う。すると、類薬学習処理の結果、類薬処理部１０３を含む類薬認識部１０５は、複数の他の医薬品の中で判定対象薬に類似する類薬を認識することができる。 In summary, when the analog drug recognition unit 105 that realizes the analog drug recognition unit includes the analog drug processing unit 103 and the analog drug processing unit 103 performs the analog drug learning process immediately before step S102, the analog drug recognition unit 105 Recognizes similar drugs by reading the package insert 202 for each of a plurality of drugs. That is, the analog drug processing unit 103 as a part of the analog drug recognition unit reads the attached document 202 from the storage unit 101 for each of a plurality of drugs, and performs an analog drug learning process using the read attached document 202. Then, as a result of the analog drug learning process, the analog drug recognition unit 105 including the analog drug processing unit 103 can recognize an analog similar to the determination target drug among a plurality of other drugs.

また、副作用判定・学習部１０８又は類薬判定・学習部１１２による同義語辞書２０３へのエントリの追加は、ある２つの医薬品同士が類薬か否かの判断に影響を及ぼすことがある。なぜなら、図１２の類薬学習処理のステップＳ６１３では図１３の処理が行われ、図１３のステップＳ７０６でキーワード類似度評価部１０９が行う図７の点数計算処理のステップＳ３０９では、同義語辞書２０３が参照されるからである。よって、同義語辞書２０３へのエントリの追加を契機として不定期に、あるいは適宜の間隔で定期的に、前処理制御部１１３は類薬判定・学習部１１２に類薬学習処理の再実行を命じてもよい。 In addition, the addition of an entry to the synonym dictionary 203 by the side effect determination / learning unit 108 or the similar drug determination / learning unit 112 may affect the determination of whether or not a certain two drugs are similar drugs. This is because the processing of FIG. 13 is performed in step S613 of the similar medicine learning processing of FIG. 12, and the synonym dictionary 203 is performed in step S309 of the score calculation processing of FIG. 7 performed by the keyword similarity evaluation unit 109 in step S706 of FIG. Is referred to. Therefore, the preprocessing control unit 113 orders the analogy medicine determination / learning part 112 to re-execute the analogy drug learning process irregularly triggered by the addition of an entry to the synonym dictionary 203 or periodically at an appropriate interval. May be.

（ｇ３）装置構成に関する変形
上記実施形態では、判定装置１００が副作用処理部１０２と類薬処理部１０３の双方を含むが、類薬処理部１０３を含む第１の装置と、副作用処理部１０２を含む第２の装置が、別々に設けられていてもよい。そして、第１の装置が前処理により学習結果テーブル２０４の既知副作用リスト以外のフィールドを学習し、第２の装置が第１の装置から学習の結果得られた学習結果テーブル２０４のデータを受け取ってもよい。 (G3) Modification Regarding Device Configuration In the above embodiment, the determination device 100 includes both the side effect processing unit 102 and the similar drug processing unit 103, but the first device including the similar drug processing unit 103 and the side effect processing unit 102 The 2nd apparatus containing may be provided separately. Then, the first device learns fields other than the known side effect list in the learning result table 204 by preprocessing, and the second device receives the data of the learning result table 204 obtained as a result of learning from the first device. Also good.

第２の装置は、受け取った学習結果テーブル２０４のデータを、第２の装置がアクセス可能な記憶装置に格納し、参照することができる。よって、第２の装置は自ら前処理を行わなくても、図３の処理を実行することができる。 The second device can store the data of the received learning result table 204 in a storage device accessible by the second device and refer to it. Therefore, the second device can execute the process of FIG. 3 without performing the pre-processing by itself.

また、図１の判定装置１００においては、キーワード類似度評価部１０９が副作用処理部１０２と類薬処理部１０３の間で共有されている。しかし、実施形態によっては、副作用処理部１０２と類薬処理部１０３にそれぞれ別々のキーワード類似度評価部１０９が設けられていてもよい。 Further, in the determination apparatus 100 of FIG. 1, the keyword similarity evaluation unit 109 is shared between the side effect processing unit 102 and the similar medicine processing unit 103. However, depending on the embodiment, separate keyword similarity evaluation units 109 may be provided for the side effect processing unit 102 and the similar drug processing unit 103, respectively.

また、図１では判定装置１００の内部に格納部１０１があるが、格納部１０１は判定装置１００の外部にあってもよい。例えば、判定装置１００の格納部１０１以外の構成要素が図２のコンピュータ３００により実現され、格納部１０１が図２の他のコンピュータ３１２の記憶装置により実現されてもよい。 Further, in FIG. 1, the storage unit 101 is inside the determination apparatus 100, but the storage unit 101 may be outside the determination apparatus 100. For example, components other than the storage unit 101 of the determination apparatus 100 may be realized by the computer 300 in FIG. 2, and the storage unit 101 may be realized by a storage device of the other computer 312 in FIG.

（ｇ４）ユーザインタフェースに関する変形
上記実施形態に関して例示したユーザインタフェースは例示に過ぎない。
例えば、副作用判定・学習部１０８は、図５に示した副作用判定結果画面４００以外の形式の画面を出力装置３０６に表示させてもよい。あるいは、副作用判定・学習部１０８は、各選択副作用についての判断結果を、Graphical User Interface（ＧＵＩ）ではなくCommand-Line Interface（ＣＬＩ）を介して出力してもよい。また、副作用判定・学習部１０８は、既知の可能性があると判断した副作用に関して、ユーザからの既知か未知かの判断の入力を、図５の副作用判定結果画面４００などのＧＵＩを介して受け取ることもできるし、ＣＬＩを介して受け取ることもできる。 (G4) Modifications Related to User Interface The user interface illustrated with respect to the above embodiment is merely an example.
For example, the side effect determination / learning unit 108 may cause the output device 306 to display a screen other than the side effect determination result screen 400 shown in FIG. Alternatively, the side effect determination / learning unit 108 may output the determination result for each selected side effect via a command-line interface (CLI) instead of a graphical user interface (GUI). Further, the side effect determination / learning unit 108 receives an input from the user regarding whether the side effect is known or unknown regarding the side effect determined to be known via a GUI such as the side effect determination result screen 400 of FIG. Can also be received via the CLI.

同様に、ある２つの医薬品同士が類薬か否かに関するユーザの判断結果を類薬判定・学習部１１２が受け付けるためのユーザインタフェースも、実施形態に応じて任意であり、ＧＵＩでもよいしＣＬＩでもよい。 Similarly, the user interface for the analogy medicine determination / learning unit 112 to accept the determination result of the user regarding whether or not two medicines are similar is also optional depending on the embodiment, and may be GUI or CLI. Good.

また、図３の例では、副作用判定・学習部１０８は、判定対象薬のすべての類薬について、当該類薬の副作用キーワード群と選択副作用との類似度をステップＳ１１２で求めている。しかし、実施形態によっては、副作用判定・学習部１０８は、判定対象薬の一部の類薬についてのみステップＳ１１２の処理を行ってもよい。以下に、判定対象薬に類薬が３つある場合を具体例として挙げて説明する。 In the example of FIG. 3, the side effect determination / learning unit 108 obtains the similarity between the side effect keyword group of the related drug and the selected side effect for all similar drugs as the determination target drugs in step S112. However, depending on the embodiment, the side effect determination / learning unit 108 may perform the process of step S112 only for some of the determination target drugs. Hereinafter, a case where there are three similar drugs in the determination target drug will be described as a specific example.

副作用判定・学習部１０８は、まず１つ目の類薬についてステップＳ１１２の処理を行い、その結果得られた類似度がα_１以上であれば、「選択副作用は既知の副作用である」と判断することができる。よって、副作用判定・学習部１０８は、ステップＳ１１３とＳ１１４の処理を行った後、ステップＳ１１０に戻らずにすぐにステップＳ１１６の処理を実行してもよい。つまり、この場合、副作用判定・学習部１０８は、選択副作用に関しては、２つ目と３つ目の類薬と判定対象薬との比較を省略してもよい。 The side effect determination / learning unit 108 first performs the process of step S112 on the first drug, and determines that the selected side effect is a known side effect if the resulting similarity is α ₁ or more. can do. Therefore, the side effect determination / learning unit 108 may perform the process of step S116 immediately after performing the processes of steps S113 and S114 without returning to step S110. That is, in this case, the side effect determination / learning unit 108 may omit the comparison between the second and third similar drugs and the determination target drug regarding the selected side effect.

また、場合によっては、１つ目の類薬に関して行ったステップＳ１１２の処理の結果として得られた類似度がα_１未満であり、２つ目の類薬に関して行ったステップＳ１１２の処理の結果として得られた類似度がα_１以上ということもある。その場合、副作用判定・学習部１０８は、２つ目の類薬に関してステップＳ１１３の判断の後にステップＳ１１４の処理を行い、その後すぐにステップＳ１１６の処理を実行すればよい。 In some cases, resulting similarity of processing first step S112 of performing with respect to similar drugs are the α less than _1, as a result of the processing of step S112 was carried out with respect to the second similar drugs The obtained similarity may be α ₁ or more. In this case, the side effect determination / learning unit 108 may perform the process of step S114 after the determination of step S113 for the second similar drug, and immediately execute the process of step S116.

すなわち、副作用判定・学習部１０８は、α_１以上の類似度が得られるまでは類薬を順々に考慮するが、ある類薬に関してα_１以上の類似度が得られれば、残りの類薬についてはステップＳ１１１〜Ｓ１１５の処理を省略してもよい。 That is, the side effects judgment and learning unit 108 until alpha ₁ or more similarity is obtained consider one after the other class drugs, as long obtained alpha ₁ or more similarity with respect to a class of drugs, the remaining similar drugs The processing of steps S111 to S115 may be omitted.

（ｇ５）データに関する変形
図１には同義語辞書２０３をテーブル形式で示したが、同義語辞書２０３のデータ形式は任意である。また、図４の学習結果テーブル２０４が表すデータも、テーブル以外の任意のデータ形式で表すことができる。例えば、同義語辞書２０３と学習結果テーブル２０４は、eXtensible Markup Language（ＸＭＬ）データベースにより実現されてもよい。 (G5) Data-related modification FIG. 1 shows the synonym dictionary 203 in a table format, but the data format of the synonym dictionary 203 is arbitrary. The data represented by the learning result table 204 in FIG. 4 can also be represented in any data format other than the table. For example, the synonym dictionary 203 and the learning result table 204 may be realized by an eXtensible Markup Language (XML) database.

なお、同義語辞書２０３は省略可能である。その場合、同義語辞書２０３へのエントリの追加処理も省略可能であり、また、部分文字列類似度評価部１１０は、図７のステップＳ３０９で、部分文字列ｓｕｂｓｔｒと完全一致する部分文字列のみを探す。 The synonym dictionary 203 can be omitted. In that case, the process of adding an entry to the synonym dictionary 203 can be omitted, and the partial character string similarity evaluation unit 110 performs only the partial character string that completely matches the partial character string substr in step S309 of FIG. Search for.

また、上記実施形態では個々の添付文書２０２がそれぞれ１つのファイルである場合を例として説明したが、添付文書群２０１全体が１つのファイルであってもよい。あるいは、学習結果テーブル２０４と添付文書群２０１の全体が１つのＸＭＬデータベースファイルにより実現されてもよい。 In the above embodiment, the case where each attached document 202 is one file has been described as an example. However, the entire attached document group 201 may be one file. Alternatively, the entire learning result table 204 and the attached document group 201 may be realized by a single XML database file.

そして、上記実施形態では、各医薬品のＩＤとして販売名コードが使われるが、販売名コード以外のデータ（例えば前処理制御部１１３が自動的に各医薬品に割り付ける連番）がＩＤとして利用されてもよい。 In the above-described embodiment, the sales name code is used as the ID of each medicine, but data other than the sales name code (for example, a serial number automatically assigned to each medicine by the preprocessing control unit 113) is used as the ID. Also good.

また、図８に示した配点情報５０１と基準値情報５０２における具体的数値は一例であり、実施形態に応じて適宜具体的数値は変えることができる。また、例えば図８の基準値情報５０２によれば、基準値は、取りうる最高点数の定数倍（β倍）として定義されているが、基準値は、取りうる最高点数以下の値であればよく、取りうる最高点数の定数倍でなくてもよい。また、実施形態によっては、βが、ユーザ指定の可能な可変パラメタでもよい。 Moreover, the specific numerical values in the scoring information 501 and the reference value information 502 shown in FIG. 8 are examples, and the specific numerical values can be appropriately changed according to the embodiment. Further, for example, according to the reference value information 502 in FIG. 8, the reference value is defined as a constant multiple (β times) of the maximum possible score, but if the reference value is less than the maximum possible score, It may not be a constant multiple of the maximum possible score. In some embodiments, β may be a variable parameter that can be specified by the user.

そして、学習結果テーブル２０４は、図４に示したフィールドのうち一部が省略されていてもよい。例えば、上記（ｇ２）に関して述べたように、「類薬リスト」フィールドは実施形態によっては省略可能である。 In the learning result table 204, some of the fields illustrated in FIG. 4 may be omitted. For example, as described in the above (g2), the “similar medicine list” field may be omitted depending on the embodiment.

同様に、「効能・効果キーワード群」フィールドと「副作用キーワード群」フィールドも、実施形態によっては省略可能である。つまり、この２つのフィールドをなくす代わりに、キーワード群を使う処理のたびに、効能・効果キーワード抽出部１１１又は副作用キーワード抽出部１０７がキーワード抽出を行うことも可能である。 Similarly, the “efficacy / effect keyword group” field and the “side effect keyword group” field may be omitted depending on the embodiment. That is, instead of eliminating these two fields, the efficacy / effect keyword extraction unit 111 or the side effect keyword extraction unit 107 can perform keyword extraction each time a process using a keyword group is performed.

例えば、キーワード群を使う処理のたびに副作用キーワード抽出部１０７がキーワード抽出を行う場合には、副作用キーワード抽出部１０７によって比較対象集合取得手段を実現することができる。すなわち、比較対象集合取得手段としての副作用キーワード抽出部１０７は、類薬として認識された医薬品の添付文書２０２における副作用の記載部分に含まれる語句の集合を比較対象語句集合として取得する。 For example, when the side effect keyword extraction unit 107 performs keyword extraction each time a process using a keyword group is performed, the side effect keyword extraction unit 107 can realize a comparison target set acquisition unit. That is, the side effect keyword extraction unit 107 serving as a comparison target set acquisition unit acquires a set of words and phrases included in the side effect description part in the attached document 202 of the medicine recognized as a similar drug as a comparison target word set.

具体的には、比較対象集合取得手段としての副作用キーワード抽出部１０７は、類薬として認識された医薬品の添付文書２０２を格納部１０１から読み出す。そして、副作用キーワード抽出部１０７は、読み出した当該添付文書２０２における副作用の記載部分から、語句抽出処理により語句の集合を抽出することによって、比較対象語句集合を取得する。 Specifically, the side effect keyword extraction unit 107 serving as a comparison target set acquisition unit reads out from the storage unit 101 the attached document 202 of a medicine recognized as an analog. Then, the side effect keyword extraction unit 107 acquires a set of comparison target phrases by extracting a set of phrases from the side effect description part in the read attached document 202 by a phrase extraction process.

また、実施形態によっては、「既知副作用リスト」フィールドも省略可能である。ただし、上記実施形態では、様々な医療機関から同じ自社薬について同じ副作用が複数回報告される可能性を想定して、学習結果テーブル２０４には「既知副作用リスト」フィールドが設けられている。 In some embodiments, the “known side effect list” field can also be omitted. However, in the above embodiment, a “known side effect list” field is provided in the learning result table 204 on the assumption that the same side effect may be reported multiple times for the same in-house drug from various medical institutions.

つまり、ある副作用の発生が初めて医療機関から製薬会社に報告されてから、添付文書２０２の改訂が行われるまでの期間中は、当該副作用は周知ではないので、異なる医療機関がそれぞれ製薬会社に当該副作用の報告を行うかもしれない。その場合に、判定装置１００あるいはユーザが同じ判断を何度も繰り返さなくてもよいように、上記実施形態では学習結果テーブル２０４が「既知副作用リスト」フィールドを備え、副作用判定・学習部１０８が「既知副作用リスト」フィールドの学習を行う。 In other words, since the side effect is not known during the period from the time when the occurrence of a side effect is first reported to the pharmaceutical company by the medical institution until the revision of the package insert 202, the different medical institution will be assigned to the pharmaceutical company. May report side effects. In this case, in the above embodiment, the learning result table 204 includes the “known side effect list” field so that the determination apparatus 100 or the user does not have to repeat the same determination many times. Study the “known side effects list” field.

なお、以上例示したようなフィールドの省略とは逆に、学習結果テーブル２０４は、図４にないフィールドをさらに有していてもよい。例えば、判定装置１００が製薬会社において運用される場合、学習結果テーブル２０４は、自社薬か他社薬かを示すフラグのフィールドを有していてもよい。すると、類薬の学習を行う学習対象薬を自社薬に限定することができるようになる。 Contrary to the omission of fields as exemplified above, the learning result table 204 may further include fields not shown in FIG. For example, when the determination apparatus 100 is operated in a pharmaceutical company, the learning result table 204 may have a flag field indicating whether it is an in-house drug or a competitor drug. Then, it becomes possible to limit learning target drugs for learning similar drugs to in-house drugs.

上記実施形態における図１１と１４の処理は、各医薬品がどの製薬会社の製品かによらない処理であり、判定装置１００が製薬会社以外の第３者機関で運用される場合にも適用可能な処理である。それに対し、判定装置１００が製薬会社で運用される場合は、安全性情報報告文書２０５は自社薬に関するもののみである。よって、他社薬同士が類薬か否かという情報は必要ではなく、自社薬同士が類薬か否か、自社薬と他社薬が類薬か否か、という情報さえあれば十分である。 The processing of FIGS. 11 and 14 in the above-described embodiment is processing that does not depend on which pharmaceutical company each pharmaceutical product is, and can be applied when the determination apparatus 100 is operated by a third party organization other than the pharmaceutical company. It is processing. On the other hand, when the determination apparatus 100 is operated by a pharmaceutical company, the safety information report document 205 is only related to its own drug. Therefore, information on whether or not the other company's drugs are similar is not necessary, and it is sufficient if there is only information on whether or not the own company's drugs are similar or whether the other company's drugs and other companies' drugs are similar.

そこで、例えば、前処理制御部１１３は、図１１のステップＳ５０９の類薬学習処理を類薬判定・学習部１１２に行わせる前に、ｉ番目の医薬品のＩＤに対応する学習結果テーブル２０４内のエントリにおける上記フラグの値を参照する。すると、前処理制御部１１３は、フラグの値から、ｉ番目の医薬品が自社薬か否かを判断することができる。 Therefore, for example, the preprocessing control unit 113 sets the learning result table 204 in the learning result table 204 corresponding to the ID of the i-th medicine before causing the similar drug determination / learning unit 112 to perform the similar drug learning process in step S509 of FIG. Refer to the value of the flag in the entry. Then, the preprocessing control unit 113 can determine from the value of the flag whether or not the i-th drug is an in-house drug.

そして、ｉ番目の医薬品が自社薬なら、前処理制御部１１３はステップＳ５０９のとおり類薬判定・学習部１１２に類薬学習処理を行わせる。他方、ｉ番目の医薬品が他社薬なら、ステップＳ５０９は省略される。 If the i-th drug is an in-house drug, the preprocessing control unit 113 causes the similar drug determination / learning unit 112 to perform similar drug learning processing as in step S509. On the other hand, if the i-th drug is another company's drug, step S509 is omitted.

また、図１４の処理は次のように変形されてもよい。すなわち、前処理制御部１１３は、ステップＳ８０６の後で、対象薬ＩＤに対応する学習結果テーブル２０４のエントリにおいて上記フラグの値を参照し、対象薬が自社薬か否かを判断する。そして、対象薬が自社薬の場合は、図１４と同様に前処理制御部１１３はステップＳ８０７の処理を実行し、ステップＳ８０８で類薬判定・学習部１１２に類薬学習処理を行わせればよい。 Further, the processing of FIG. 14 may be modified as follows. That is, after step S806, the preprocessing control unit 113 refers to the value of the flag in the entry of the learning result table 204 corresponding to the target drug ID, and determines whether the target drug is an in-house drug. If the target drug is an in-house drug, the pre-processing control unit 113 performs the process of step S807 as in FIG. 14, and the analog drug determination / learning unit 112 performs the analog drug learning process in step S808. .

他方、対象薬が他社薬の場合、前処理制御部１１３は、各自社薬について、当該自社薬を学習対象薬として図１２の類薬学習処理を行うよう類薬判定・学習部１１２に命令する。その際、前処理制御部１１３は、類薬学習処理を行う比較範囲を、図１４における対象薬（つまり添付文書２０２の追加又は更新があった他社薬）のみに限定するよう、類薬判定・学習部１１２に指定すればよい。 On the other hand, when the target drug is another company's drug, the preprocessing control unit 113 instructs the similar drug determination / learning unit 112 to perform the similar drug learning process of FIG. . At that time, the preprocessing control unit 113 limits the comparison range in which the similar drug learning process is performed to only the target drug in FIG. 14 (that is, the other drug for which the attached document 202 has been added or updated). What is necessary is just to specify to the learning part 112.

以上のように前処理を変形することで、自社薬についてのみ効率よく類薬処理部１０３が類薬の学習を行うことが可能となる。
また、図４の例では学習結果テーブル２０４の既知副作用リストの要素は、既知と判定された副作用である。しかし、実施形態によっては、どの類薬を根拠として副作用が既知と判定されたのかを示す類薬ＩＤと当該副作用とのペアが、既知副作用リストの各要素として記録されてもよい。つまり、図３のステップＳ１１６又はＳ１１７における学習の際に、副作用判定・学習部１０８は、既知副作用学習リスト又は既知副作用候補リスト内のＩＤを副作用と対応づけたペアを、学習結果テーブル２０４の既知副作用リストに追加してもよい。 By modifying the pre-processing as described above, it becomes possible for the similar drug processing unit 103 to learn the similar drug efficiently only for the own drug.
In the example of FIG. 4, the elements of the known side effect list in the learning result table 204 are side effects determined to be known. However, depending on the embodiment, a pair of the similar drug ID indicating which side effect is determined to be known based on which similar drug may be recorded as each element of the known side effect list. That is, during the learning in step S116 or S117 in FIG. 3, the side effect determination / learning unit 108 sets a pair in which the ID in the known side effect learning list or the known side effect candidate list is associated with the side effect in the learning result table 204. It may be added to the side effect list.

（ｇ６）点数正規化処理に関する変形
図１０の点数正規化処理は、上位３位までの点数を引数とする。しかし、Ｔを２以上の任意の整数として、点数正規化処理は、上位Ｔ位までの点数を引数とするように変形されてもよい。その場合も、点数正規化処理を行う副作用判定・学習部１０８又は類薬判定・学習部１１２は、ＮＵＬＬではなく具体的に値の与えられているｔ個（ｔ≦Ｔ）の引数を使って、ｔ個の引数の値の２乗平均平方根を、正規化した値として算出すればよい。 (G6) Modification related to point normalization processing The point normalization processing in FIG. 10 uses the upper three points as arguments. However, the score normalization process may be modified so that the score up to the upper T rank is used as an argument, where T is an arbitrary integer of 2 or more. Even in that case, the side effect determination / learning unit 108 or the analogy drug determination / learning unit 112 that performs the point normalization process uses t (t ≦ T) arguments that are specifically given values instead of NULL. , The root mean square of the t argument values may be calculated as a normalized value.

あるいは、副作用判定・学習部１０８又は類薬判定・学習部１１２は、複数の点数の２乗平均平方根ではなく、複数の点数の相加平均、相乗平均、重み付き平均などを、複数の点数を正規化した点数として求めてもよい。 Alternatively, the side effect determination / learning unit 108 or the analogy drug determination / learning unit 112 may calculate an arithmetic average, a geometric average, a weighted average, and the like of a plurality of points instead of a root mean square of a plurality of points. You may obtain | require as a normalized score.

（ｇ７）配点に関する変形
上記実施形態では、類薬判定・学習部１１２がキーワード類似度評価部１０９に行わせる点数計算処理と、副作用判定・学習部１０８がキーワード類似度評価部１０９に行わせる点数計算処理で、同じ図８の配点情報５０１が使われる。しかし、実施形態によっては、類薬判定・学習部１１２がキーワード類似度評価部１０９に行わせる点数計算処理用の配点情報とは別の配点情報が、副作用判定・学習部１０８がキーワード類似度評価部１０９に行わせる点数計算処理で用いられてもよい。 (G7) Modification Regarding Scoring In the above embodiment, the score calculation process that the analog medicine determination / learning unit 112 causes the keyword similarity evaluation unit 109 to perform, and the score that the side effect determination / learning unit 108 performs to the keyword similarity evaluation unit 109 The same scoring information 501 in FIG. 8 is used in the calculation process. However, in some embodiments, the side effect determination / learning unit 108 evaluates the keyword similarity by using different score information from the point calculation processing score information that the analog similarity determination / learning unit 112 causes the keyword similarity evaluation unit 109 to perform. It may be used in the score calculation process to be performed by the unit 109.

その場合、２種類の配点情報が使い分けられるのにあわせて、２種類の基準値情報が使い分けられてもよい。つまり、図６のステップＳ２０６における一致の基準と、図１３のステップＳ７０７における一致の基準は別の基準でもよい。 In that case, two types of reference value information may be used properly in accordance with the two types of scoring information being used properly. In other words, the matching criterion in step S206 in FIG. 6 and the matching criterion in step S707 in FIG. 13 may be different criteria.

また、図７のステップＳ３０９において、部分文字列同士が完全に一致する場合と同義語として一致する場合で、ステップ３１０で加算される配点が異なっていてもよい。つまり、図８の配点情報５０１は、長さごとに配点を定義する情報だが、配点情報５０１において、長さごとに、完全一致用の配点と同義語一致用の配点がそれぞれ定義されていてもよい。その場合、同じ長さに対応する完全一致用の配点は、同義語一致用の配点以上となるよう定義される。 Further, in step S309 in FIG. 7, the points added in step 310 may be different depending on whether the partial character strings are completely matched with each other as synonyms. That is, the scoring information 501 in FIG. 8 is information that defines scoring for each length. However, even in the scoring information 501, a scoring for perfect match and a scoring for synonym matching are defined for each length. Good. In that case, the complete matching score corresponding to the same length is defined to be equal to or higher than the synonym matching score.

（ｇ８）キーワード抽出に関する変形
上記実施形態では、副作用キーワード抽出部１０７は添付文書２０２の「副作用」セクションからキーワードを抽出する。しかし、実施形態によっては、副作用キーワード抽出部１０７はさらに、添付文書２０２の「相互作用」セクションなどの他のセクションからもキーワードを抽出し、学習結果テーブル２０４の「副作用キーワード群」に加えてもよい。 (G8) Modification Regarding Keyword Extraction In the above embodiment, the side effect keyword extraction unit 107 extracts a keyword from the “side effect” section of the attached document 202. However, depending on the embodiment, the side effect keyword extraction unit 107 may further extract keywords from other sections such as the “interaction” section of the attached document 202 and add them to the “side effect keyword group” of the learning result table 204. Good.

同様に、効能・効果キーワード抽出部１１１は添付文書２０２の「効能又は効果」セクションだけではなく「薬効薬理」セクションなどの他のセクションからもキーワードを抽出し、学習結果テーブル２０４の「効能・効果キーワード群」に加えてもよい。 Similarly, the efficacy / effect keyword extraction unit 111 extracts keywords from not only the “efficacy or effect” section of the attached document 202 but also other sections such as the “medicinal pharmacology” section, and the “effect / effect” in the learning result table 204. It may be added to “keyword group”.

また、上記実施形態に関して、副作用キーワード抽出部１０７と効能・効果キーワード抽出部１１１がそれぞれ行うキーワード抽出のアルゴリズムをいくつか例示した。しかし、キーワード抽出のアルゴリズムは上記に例示したものに限らない。例えば、副作用キーワード抽出部１０７又は効能・効果キーワード抽出部１１１は、ストップワードリストを持っていてもよく、ストップワードリスト中のストップワードはキーワードとして抽出しないようにしてもよい。例えば、添付文書２０２に使われる用語の中では、「患者」や「投与」などの語がストップワードに含まれていてもよい。 Further, regarding the above-described embodiment, several examples of keyword extraction algorithms respectively performed by the side effect keyword extraction unit 107 and the efficacy / effect keyword extraction unit 111 are illustrated. However, the algorithm for keyword extraction is not limited to that exemplified above. For example, the side effect keyword extraction unit 107 or the efficacy / effect keyword extraction unit 111 may have a stop word list, and stop words in the stop word list may not be extracted as keywords. For example, in terms used in the package insert 202, words such as “patient” and “administration” may be included in the stop word.

また、上記実施形態に関しては、形態素解析の結果を利用するキーワード抽出の例として、名詞の連なりをキーワードとして抽出する手法を例示したが、キーワードは名詞の連なりでなくてもよい。例えば、副作用キーワード抽出部１０７又は効能・効果キーワード抽出部１１１は、形態素解析の結果から、形容詞と当該形容詞に後続する名詞の連なりの全体を、キーワードとして抽出してもよい。 In the above embodiment, as an example of keyword extraction using the result of morphological analysis, a technique of extracting a series of nouns as a keyword has been illustrated. However, the keyword may not be a series of nouns. For example, the side effect keyword extraction unit 107 or the effect / effect keyword extraction unit 111 may extract, as a keyword, the entire series of adjectives and nouns following the adjective from the result of morphological analysis.

また、このようにキーワードの中に複数種類の品詞が含まれる場合、同義語辞書２０３には、品詞の差を越えた同義語の対が登録されていてもよい。例えば、「上昇」と「高い」を対にしたエントリが、予め同義語辞書２０３に登録されていてもよい。すると、「血圧上昇」と「高い血圧」という２つのキーワードに対して図７の点数計算処理によって計算される点数もある程度高くなり、キーワードの意味をより良く点数に反映することも可能となる。 Further, when a plurality of types of parts of speech are included in the keyword as described above, the synonym dictionary 203 may register synonym pairs exceeding the difference in parts of speech. For example, an entry in which “rising” and “high” are paired may be registered in the synonym dictionary 203 in advance. Then, the score calculated by the score calculation process of FIG. 7 for the two keywords “blood pressure increase” and “high blood pressure” also increases to some extent, and it is possible to better reflect the meaning of the keyword in the score.

最後に、上記の種々の実施形態に関して、さらに下記の付記を開示する。
（付記１）
医薬品を特定するための情報を受け付け、前記情報が示す前記医薬品を判定対象薬として特定する特定手段と、
前記判定対象薬の副作用について記載した報告文書を取得する報告文書取得手段と、
医薬品を一意に識別する識別情報と該医薬品の類薬とを関連付ける類薬学習結果情報を格納手段から読み出すことにより、あるいは、複数の医薬品の各々について、当該医薬品の前記識別情報と当該医薬品の副作用と当該医薬品の効能又は効果を含む添付文書を前記格納手段から読み出すことにより、複数の他の医薬品の中で前記判定対象薬に類似する類薬を認識する類薬認識手段と、
語句内に含まれる部分文字列同士の類似度を評価するのに、第１の長さと第２の長さを足した第３の長さの部分文字列同士が一致する場合には前記第１の長さの部分文字列同士が一致する場合の評価と前記第２の長さの部分文字列同士が一致する場合の評価を足した評価以上の高い評価を与える部分文字列類似度評価手段と、
２つの語句の各々をそれぞれ分割して得られる部分文字列同士の類似度を前記文字列類似度評価手段に評価させ、前記文字列類似度評価手段による評価の結果を集計することで、前記２つの語句の各々を１つ以上の部分文字列に分割する分割パターンの組み合わせを評価し、前記２つの語句それぞれの分割パターンの複数通りの組み合わせについての評価を用いて前記２つの語句同士の類似度を評価する語句類似度評価手段と、
前記報告文書から、前記判定対象薬の前記副作用を示す語句を、判定対象副作用語句として抽出する副作用語句抽出手段と、
前記類薬認識手段により前記類薬として認識された医薬品の添付文書を前記格納手段から読み出して、該添付文書における副作用の記載部分から、語句抽出処理により語句の集合を抽出することによって、あるいは、前記格納手段から、前記類薬認識手段により前記類薬として認識された前記医薬品の添付文書における副作用の記載部分からの語句抽出処理により得られた語句の集合を前記類薬として認識された前記医薬品の前記識別情報と関連付ける副作用学習結果情報を読み出すことによって、前記類薬として認識された前記医薬品の前記添付文書における前記副作用の前記記載部分に含まれる語句の集合を、比較対象語句集合として取得する比較対象集合取得手段と、
前記類薬の少なくとも一部について、それぞれ、当該類薬に関して取得された前記比較対象語句集合に含まれる語句と、前記判定対象副作用語句との組み合わせを、前記語句類似度評価手段に評価させ、評価の結果と第１の閾値とを用いて、前記判定対象副作用語句が示す前記副作用が当該類薬において既知の副作用か否かを判定する判定手段と、
前記判定手段による判定結果を出力する出力手段
を備えることを特徴とする情報処理装置。
（付記２）
前記判定対象薬の添付文書を前記格納手段から読み出し、該添付文書における効能又は効果の記載部分に対して語句抽出処理を行って第１の効能効果語句集合を取得し、前記複数の他の医薬品それぞれについて当該医薬品の添付文書を前記格納手段から読み出し、該添付文書における効能又は効果の記載部分に対して語句抽出処理を行って当該他の医薬品に関する第２の効能効果語句集合を取得する効能効果語句抽出手段と、
前記複数の他の医薬品の少なくとも一部について、それぞれ、
当該他の医薬品に関して取得された前記第２の効能効果語句集合に含まれる語句と、前記第１の効能効果語句集合に含まれる語句との組み合わせを、前記語句類似度評価手段に評価させ、
複数の組み合わせについての評価を集計することで、前記判定対象薬と当該他の医薬品との間の効能又は効果の類似度を示す値を算出し、
算出した前記値を第２の閾値と比較し、
算出した前記値の示す類似度が前記第２の閾値の示す類似度よりも高いとき、当該他の医薬品を前記判定対象薬の類薬と判定し、前記判定対象薬の前記識別情報と当該他の医薬品を関連付けるように前記格納手段上の前記類薬学習結果情報を更新する第１の類薬判定手段と、
をさらに備え、
前記類薬認識手段は、前記第１の類薬判定手段の判定結果にしたがって前記判定対象薬の前記類薬を認識する
ことを特徴とする付記１に記載の情報処理装置。
（付記３）
前記複数の他の医薬品のうち、前記添付文書に記載されている薬効分類名、基準名、一般名、化学名又は構造式が前記判定対象薬と一致する医薬品を、前記類薬と判定し、前記類薬と判定した当該医薬品を前記判定対象薬の前記識別情報と関連付けるように前記格納手段上の前記類薬学習結果情報を更新する第２の類薬判定手段をさらに備え、
前記第１の類薬判定手段は、前記複数の他の医薬品のうち前記第２の類薬判定手段により前記類薬として判定されていない医薬品について、前記類薬か否かの判定を行い、
前記類薬認識手段は、前記第２の類薬判定手段と前記第１の類薬判定手段双方の判定結果にしたがって前記判定対象薬の前記類薬を認識する
ことを特徴とする付記２に記載の情報処理装置。
（付記４）
前記語句類似度評価手段は、前記２つの語句について、それぞれの語句内で互いに隣接する任意の２文字の間で分割する分割パターンとして可能な分割パターン同士のすべての組み合わせについての評価を集計することで、前記２つの語句同士の前記類似度を求める
ことを特徴とする付記１から３のいずれか１項に記載の情報処理装置。
（付記５）
前記語句類似度評価手段は、前記２つの語句各々を形態素区切りの位置で分割する分割パターンのみを用いて前記２つの語句同士の前記類似度を求める
ことを特徴とする付記１から３のいずれか１項に記載の情報処理装置。
（付記６）
コンピュータに、
医薬品を特定するための情報を受け付け、
前記情報が示す前記医薬品を判定対象薬として特定し、
前記判定対象薬の副作用について記載した報告文書を取得し、
医薬品を一意に識別する識別情報と該医薬品の類薬とを関連付ける類薬学習結果情報を格納手段から読み出すことにより、あるいは、複数の医薬品の各々について、当該医薬品の前記識別情報と当該医薬品の副作用と当該医薬品の効能又は効果を含む添付文書を前記格納手段から読み出すことにより、複数の他の医薬品の中で前記判定対象薬に類似する類薬を認識し、
前記報告文書から、前記判定対象薬の前記副作用を示す語句を、判定対象副作用語句として抽出し、
前記類薬として認識した医薬品の添付文書を前記格納手段から読み出して、該添付文書における副作用の記載部分から、語句抽出処理により語句の集合を抽出することによって、あるいは、前記格納手段から、前記類薬として認識した前記医薬品の添付文書における副作用の記載部分からの語句抽出処理により得られた語句の集合を前記類薬として認識した前記医薬品の前記識別情報と関連付ける副作用学習結果情報を読み出すことによって、前記類薬として認識した前記医薬品の前記添付文書における前記副作用の前記記載部分に含まれる語句の集合を、比較対象語句集合として取得し、
前記類薬の少なくとも一部について、それぞれ、当該類薬に関して取得された前記比較対象語句集合に含まれる語句と、前記判定対象副作用語句との組み合わせを評価し、評価の結果と第１の閾値とを用いて、前記判定対象副作用語句が示す前記副作用が当該類薬において既知の副作用か否かを判定し、
前記判定対象副作用語句が示す前記副作用が既知の副作用か否かの判定結果を出力する
ことを含む副作用判定処理を実行させ、
前記比較対象語句集合に含まれる前記語句と前記判定対象副作用語句との前記組み合わせの評価のために、語句類似度評価処理として、
２つの語句の各々をそれぞれ分割して得られる部分文字列同士の類似度を、第１の長さと第２の長さを足した第３の長さの部分文字列同士が一致する場合には前記第１の長さの部分文字列同士が一致する場合の評価と前記第２の長さの部分文字列同士が一致する場合の評価を足した評価以上の高い評価を与えるようにして評価し、
前記部分文字列同士について評価した前記類似度を集計することで、前記２つの語句の各々を１つ以上の部分文字列に分割する分割パターンの組み合わせを評価し、
前記２つの語句それぞれの分割パターンの複数通りの組み合わせについての評価を用いて前記２つの語句同士の類似度を評価する
ことを含む処理を実行させる判定プログラム。
（付記７）
前記副作用判定処理は、さらに、
前記判定対象薬の添付文書を前記格納手段から読み出し、該添付文書における効能又は効果の記載部分に対して語句抽出処理を行って第１の効能効果語句集合を取得し、
前記複数の他の医薬品それぞれについて当該医薬品の添付文書を前記格納手段から読み出し、該添付文書における効能又は効果の記載部分に対して語句抽出処理を行って当該他の医薬品に関する第２の効能効果語句集合を取得する
ことを含み、
前記判定プログラムは、前記コンピュータに、
前記複数の他の医薬品の少なくとも一部について、それぞれ、
当該他の医薬品に関して取得された前記第２の効能効果語句集合に含まれる語句と、前記第１の効能効果語句集合に含まれる語句との組み合わせを、前記語句類似度評価処理により評価し、
複数の組み合わせについての評価を集計することで、前記判定対象薬と当該他の医薬品との間の効能又は効果の類似度を示す値を算出し、
算出した前記値を第２の閾値と比較し、算出した前記値の示す類似度が前記第２の閾値の示す類似度よりも高いとき、当該他の医薬品を前記判定対象薬の類薬と判定し、前記判定対象薬の前記識別情報と当該他の医薬品を関連付けるように前記格納手段上の前記類薬学習結果情報を更新する
ことを含む第１の類薬判定処理をさらに実行させ、
前記第１の類薬判定処理の判定結果にしたがって前記判定対象薬の前記類薬を認識させる、
ことを特徴とする付記６に記載の判定プログラム。
（付記８）
前記判定プログラムは、
前記複数の他の医薬品のうち、前記添付文書に記載されている薬効分類名、基準名、一般名、化学名又は構造式が前記判定対象薬と一致する医薬品を、前記類薬と判定し、前記類薬と判定した当該医薬品を前記判定対象薬の前記識別情報と関連付けるように前記格納手段上の前記類薬学習結果情報を更新する第２の類薬判定処理を前記コンピュータに実行させ、
前記複数の他の医薬品のうち前記第２の類薬判定処理により前記類薬として判定されていない医薬品を対象として、前記コンピュータに前記第１の類薬判定処理を行わせ、
前記第２の類薬判定処理と前記第１の類薬判定処理双方の判定結果にしたがって前記コンピュータに前記判定対象薬の前記類薬を認識させる
ことを特徴とする付記７に記載の判定プログラム。
（付記９）
前記語句類似度評価処理は、前記２つの語句について、それぞれの語句内で互いに隣接する任意の２文字の間で分割する分割パターンとして可能な分割パターン同士のすべての組み合わせについての評価を集計することで、前記２つの語句同士の前記類似度を求めることを含む
ことを特徴とする付記６から８のいずれか１項に記載の判定プログラム。
（付記１０）
前記語句類似度評価処理は、前記２つの語句各々を形態素区切りの位置で分割する分割パターンのみを用いて前記２つの語句同士の前記類似度を求めることを含む
ことを特徴とする付記６から８のいずれか１項に記載の判定プログラム。
（付記１１）
コンピュータが、
医薬品を特定するための情報を受け付け、
前記情報が示す前記医薬品を判定対象薬として特定し、
前記判定対象薬の副作用について記載した報告文書を取得し、
医薬品を一意に識別する識別情報と該医薬品の類薬とを関連付ける類薬学習結果情報を格納手段から読み出すことにより、あるいは、複数の医薬品の各々について、当該医薬品の前記識別情報と当該医薬品の副作用と当該医薬品の効能又は効果を含む添付文書を前記格納手段から読み出すことにより、複数の他の医薬品の中で前記判定対象薬に類似する類薬を認識し、
前記報告文書から、前記判定対象薬の前記副作用を示す語句を、判定対象副作用語句として抽出し、
前記類薬として認識した医薬品の添付文書を前記格納手段から読み出して、該添付文書における副作用の記載部分から、語句抽出処理により語句の集合を抽出することによって、あるいは、前記格納手段から、前記類薬として認識した前記医薬品の添付文書における副作用の記載部分からの語句抽出処理により得られた語句の集合を前記類薬として認識した前記医薬品の前記識別情報と関連付ける副作用学習結果情報を読み出すことによって、前記類薬として認識した前記医薬品の前記添付文書における前記副作用の前記記載部分に含まれる語句の集合を、比較対象語句集合として取得し、
前記類薬の少なくとも一部について、それぞれ、当該類薬に関して取得された前記比較対象語句集合に含まれる語句と、前記判定対象副作用語句との組み合わせを評価し、評価の結果と第１の閾値とを用いて、前記判定対象副作用語句が示す前記副作用が当該類薬において既知の副作用か否かを判定し、
前記判定対象副作用語句が示す前記副作用が既知の副作用か否かの判定結果を出力し、
前記比較対象語句集合に含まれる前記語句と前記判定対象副作用語句との前記組み合わせの評価のために、語句類似度評価処理として、
２つの語句の各々をそれぞれ分割して得られる部分文字列同士の類似度を、第１の長さと第２の長さを足した第３の長さの部分文字列同士が一致する場合には前記第１の長さの部分文字列同士が一致する場合の評価と前記第２の長さの部分文字列同士が一致する場合の評価を足した評価以上の高い評価を与えるようにして評価し、
前記部分文字列同士について評価した前記類似度を集計することで、前記２つの語句の各々を１つ以上の部分文字列に分割する分割パターンの組み合わせを評価し、
前記２つの語句それぞれの分割パターンの複数通りの組み合わせについての評価を用いて前記２つの語句同士の類似度を評価する
ことを含む処理を実行する
ことを特徴とする判定方法。
（付記１２）
前記コンピュータが、さらに、
前記判定対象薬の添付文書を前記格納手段から読み出し、該添付文書における効能又は効果の記載部分に対して語句抽出処理を行って第１の効能効果語句集合を取得し、
前記複数の他の医薬品それぞれについて当該医薬品の添付文書を前記格納手段から読み出し、該添付文書における効能又は効果の記載部分に対して語句抽出処理を行って当該他の医薬品に関する第２の効能効果語句集合を取得し、
前記複数の他の医薬品の少なくとも一部について、それぞれ、
当該他の医薬品に関して取得された前記第２の効能効果語句集合に含まれる語句と、前記第１の効能効果語句集合に含まれる語句との組み合わせを、前記語句類似度評価処理により評価し、
複数の組み合わせについての評価を集計することで、前記判定対象薬と当該他の医薬品との間の効能又は効果の類似度を示す値を算出し、
算出した前記値を第２の閾値と比較し、算出した前記値の示す類似度が前記第２の閾値の示す類似度よりも高いとき、当該他の医薬品を前記判定対象薬の類薬と判定し、前記判定対象薬の前記識別情報と当該他の医薬品を関連付けるように前記格納手段上の前記類薬学習結果情報を更新する
ことを含む第１の類薬判定処理を実行し、
前記第１の類薬判定処理の判定結果にしたがって前記判定対象薬の前記類薬を認識する、
ことを特徴とする付記１１に記載の判定方法。
（付記１３）
前記コンピュータが、
前記複数の他の医薬品のうち、前記添付文書に記載されている薬効分類名、基準名、一般名、化学名又は構造式が前記判定対象薬と一致する医薬品を、前記類薬と判定し、前記類薬と判定した当該医薬品を前記判定対象薬の前記識別情報と関連付けるように前記格納手段上の前記類薬学習結果情報を更新する第２の類薬判定処理をさらに実行し、
前記複数の他の医薬品のうち前記第２の類薬判定処理により前記類薬として判定されていない医薬品を対象として、前記第１の類薬判定処理を行い、
前記第２の類薬判定処理と前記第１の類薬判定処理双方の判定結果にしたがって前記判定対象薬の前記類薬を認識する
ことを特徴とする付記１２に記載の判定方法。
（付記１４）
前記語句類似度評価処理は、前記２つの語句について、それぞれの語句内で互いに隣接する任意の２文字の間で分割する分割パターンとして可能な分割パターン同士のすべての組み合わせについての評価を集計することで、前記２つの語句同士の前記類似度を求めることを含む
ことを特徴とする付記１１から１３のいずれか１項に記載の判定方法。
（付記１５）
前記語句類似度評価処理は、前記２つの語句各々を形態素区切りの位置で分割する分割パターンのみを用いて前記２つの語句同士の前記類似度を求めることを含む
ことを特徴とする付記１１から１３のいずれか１項に記載の判定方法。 Finally, the following additional notes are disclosed regarding the various embodiments described above.
(Appendix 1)
A means for receiving information for identifying a drug, and identifying the drug indicated by the information as a determination target drug;
Report document acquisition means for acquiring a report document describing the side effects of the determination target drug;
By reading from the storage means the identification information that uniquely identifies the drug and the similar drug learning result information from the storage means, or for each of a plurality of drugs, the identification information of the drug and the side effects of the drug And a medicinal product recognition means for recognizing a similar drug to the determination target drug among a plurality of other medicinal products by reading out a package insert including the efficacy or effect of the medicinal product from the storage unit,
To evaluate the similarity between partial character strings included in a phrase, the first length and the second length are added to each other when the first and second partial character strings match. A partial character string similarity evaluation unit that gives a higher evaluation than the evaluation obtained by adding the evaluation when the partial character strings having the same length match each other and the evaluation when the partial character strings having the second length match. ,
By allowing the character string similarity evaluation means to evaluate the similarity between partial character strings obtained by dividing each of the two phrases, and summing up the evaluation results by the character string similarity evaluation means, the 2 Evaluate a combination of division patterns that divide each of the two phrases into one or more partial character strings, and use the evaluation of a plurality of combinations of the division patterns of each of the two phrases to determine the similarity between the two phrases A word similarity evaluation means for evaluating
From the report document, a side effect phrase extracting means for extracting a phrase indicating the side effect of the determination target drug as a determination target side effect phrase;
By reading out a package insert of a medicine recognized as the drug by the drug recognizing means from the storage means and extracting a set of words by a word extraction process from a side effect description part in the package attached, or The medicinal product recognized from the storage means as the medicinal product by a set of words and phrases obtained by the word extraction process from the side effect description part in the package insert of the medicinal product recognized by the medicinal product recognition unit as the medicinal product By reading out the side effect learning result information associated with the identification information, the set of words / phrases included in the description part of the side effect in the package insert of the drug recognized as the similar drug is acquired as a comparison target word / phrase set. A comparison target set acquisition means;
For at least a part of the analogy drug, the phrase similarity evaluation unit evaluates the combination of the word / phrase included in the comparison target word / phrase acquired for the analogy drug and the judgment target side effect word / phrase. A determination means for determining whether or not the side effect indicated by the determination target side effect phrase is a known side effect in the related drug, using the result and the first threshold;
An information processing apparatus comprising: output means for outputting a determination result by the determination means.
(Appendix 2)
The attached document of the determination target drug is read from the storage unit, and a phrase extraction process is performed on the indication of the effect or effect in the attached document to obtain a first effect-effect phrase set, and the plurality of other medicines The efficacy and effect of reading the package insert of the drug for each from the storage means and performing the phrase extraction process on the indication of the effect or effect in the package insert to obtain the second set of effect and effect phrases related to the other drug Word extraction means;
For at least some of the other pharmaceutical products, respectively
The phrase similarity evaluation means evaluates a combination of a phrase included in the second efficacy effect phrase set acquired for the other pharmaceutical product and a phrase included in the first efficacy effect phrase set,
By calculating the evaluation for a plurality of combinations, a value indicating the efficacy or similarity of the effect between the determination target drug and the other drug is calculated,
Comparing the calculated value with a second threshold;
When the similarity indicated by the calculated value is higher than the similarity indicated by the second threshold, the other medicine is determined as an analog of the determination target drug, and the identification information of the determination target drug and the other First analog medicine determination means for updating the analog medicine learning result information on the storage means so as to associate the medicines of
Further comprising
The information processing apparatus according to appendix 1, wherein the similar drug recognition unit recognizes the similar drug of the determination target drug according to a determination result of the first similar drug determination unit.
(Appendix 3)
Among the plurality of other medicinal products, a medicinal property classification name, a reference name, a common name, a chemical name or a structural formula described in the package insert are determined to be a similar drug, and the medicinal product is determined as the similar drug, A second similar medicine determination means for updating the similar medicine learning result information on the storage means so as to associate the medicine determined as the similar medicine with the identification information of the determination target medicine;
The first similar drug determination means determines whether or not it is the similar drug for a drug that has not been determined as the similar drug by the second similar drug determination means among the plurality of other drug drugs,
The additional drug recognition unit according to claim 2, wherein the similar drug recognition unit recognizes the similar drug of the determination target drug according to a determination result of both the second similar drug determination unit and the first similar drug determination unit. Information processing device.
(Appendix 4)
The phrase similarity evaluation means aggregates evaluations for all combinations of division patterns that can be divided into any two characters adjacent to each other in the two phrases. The information processing apparatus according to any one of appendices 1 to 3, wherein the similarity between the two words is obtained.
(Appendix 5)
Any one of Supplementary notes 1 to 3, wherein the word similarity evaluation means obtains the similarity between the two words using only a division pattern for dividing each of the two words at a morpheme break position. The information processing apparatus according to item 1.
(Appendix 6)
On the computer,
Accepts information to identify medicines,
The drug indicated by the information is identified as a determination target drug,
Obtain a report document describing the side effects of the determination target drug,
By reading from the storage means the identification information that uniquely identifies the drug and the similar drug learning result information from the storage means, or for each of a plurality of drugs, the identification information of the drug and the side effects of the drug And a medicinal product similar to the determination target drug among a plurality of other medicinal products by reading out the package insert including the efficacy or effect of the medicinal product from the storage unit,
From the report document, the phrase indicating the side effect of the determination target drug is extracted as a determination target side effect phrase,
By reading out a package insert of a medicine recognized as the similar drug from the storage means, and extracting a set of phrases from the side effect description part of the package insert by a phrase extraction process, or from the storage section, the class By reading out the side effect learning result information that associates the set of phrases obtained by the phrase extraction process from the description part of the side effects in the package insert of the drug recognized as a drug with the identification information of the drug recognized as the similar drug, A set of phrases included in the description part of the side effect in the package insert of the drug recognized as the similar drug is obtained as a set of phrases to be compared;
For at least a part of the similar drugs, the combination of the phrase included in the set of comparison target words acquired for the similar drug and the determination target side effect phrase is evaluated, and the result of the evaluation and the first threshold value To determine whether the side effect indicated by the determination target side effect phrase is a known side effect in the related drug,
Executing a side effect determination process including outputting a determination result of whether or not the side effect indicated by the determination target side effect phrase is a known side effect;
In order to evaluate the combination of the phrase included in the comparison target phrase set and the determination target side effect phrase, as a phrase similarity evaluation process,
When the partial character strings of the third length obtained by adding the first length and the second length match the similarity between the partial character strings obtained by dividing each of the two phrases respectively. The evaluation is performed so as to give a higher evaluation than the evaluation obtained by adding the evaluation when the partial character strings of the first length match and the evaluation when the partial character strings of the second length match. ,
Evaluating the combination of division patterns that divide each of the two phrases into one or more partial character strings by counting the similarities evaluated for the partial character strings,
The determination program which performs the process including evaluating the similarity of two said phrases using evaluation about multiple combinations of the division | segmentation pattern of each said two phrases.
(Appendix 7)
The side effect determination process further includes
Reading the attached document of the determination target drug from the storage unit, performing a phrase extraction process on the effect or effect description part in the attached document to obtain a first efficacy effect phrase set,
For each of the plurality of other medicines, a package insert of the drug is read from the storage means, and a phrase extraction process is performed on a description of the effect or effect in the package insert to obtain a second efficacy effect phrase related to the other drug Including obtaining a set,
The determination program is stored in the computer.
For at least some of the other pharmaceutical products, respectively
Evaluating a combination of a phrase included in the second efficacy effect phrase set acquired for the other pharmaceutical product and a phrase included in the first efficacy effect phrase set by the phrase similarity evaluation process,
By calculating the evaluation for a plurality of combinations, a value indicating the efficacy or similarity of the effect between the determination target drug and the other drug is calculated,
The calculated value is compared with a second threshold, and when the similarity indicated by the calculated value is higher than the similarity indicated by the second threshold, the other drug is determined as an analog of the determination target drug And further executing a first drug determination process including updating the drug learning result information on the storage means so as to associate the identification information of the drug to be determined with the other drug.
Recognizing the similar drug of the determination target drug according to the determination result of the first similar drug determination process,
The determination program according to appendix 6, characterized in that:
(Appendix 8)
The determination program is:
Among the plurality of other medicinal products, a medicinal property classification name, a reference name, a common name, a chemical name or a structural formula described in the package insert are determined to be a similar drug, and the medicinal product is determined as the similar drug, Causing the computer to execute a second similar drug determination process for updating the similar drug learning result information on the storage means so as to associate the drug determined to be the similar drug with the identification information of the determination target drug;
For a drug that has not been determined as the similar drug by the second similar drug determination process among the plurality of other drug products, the computer performs the first similar drug determination process,
The determination program according to appendix 7, wherein the computer recognizes the similar drug of the determination target drug according to determination results of both the second similar drug determination process and the first similar drug determination process.
(Appendix 9)
The phrase similarity evaluation process totals evaluations for all combinations of division patterns that can be divided into arbitrary two characters adjacent to each other in each of the two phrases. The determination program according to any one of supplementary notes 6 to 8, further comprising: calculating the similarity between the two words.
(Appendix 10)
The phrase similarity evaluation processing includes obtaining the similarity between the two phrases by using only a division pattern that divides each of the two phrases at a morpheme segmentation position. The determination program according to any one of the above.
(Appendix 11)
Computer
Accepts information to identify medicines,
The drug indicated by the information is identified as a determination target drug,
Obtain a report document describing the side effects of the determination target drug,
By reading from the storage means the identification information that uniquely identifies the drug and the similar drug learning result information from the storage means, or for each of a plurality of drugs, the identification information of the drug and the side effects of the drug And a medicinal product similar to the determination target drug among a plurality of other medicinal products by reading out the package insert including the efficacy or effect of the medicinal product from the storage unit,
From the report document, the phrase indicating the side effect of the determination target drug is extracted as a determination target side effect phrase,
By reading out a package insert of a medicine recognized as the similar drug from the storage means, and extracting a set of phrases from the side effect description part of the package insert by a phrase extraction process, or from the storage section, the class By reading out the side effect learning result information that associates the set of phrases obtained by the phrase extraction process from the description part of the side effects in the package insert of the drug recognized as a drug with the identification information of the drug recognized as the similar drug, A set of phrases included in the description part of the side effect in the package insert of the drug recognized as the similar drug is obtained as a set of phrases to be compared;
For at least a part of the similar drugs, the combination of the phrase included in the set of comparison target words acquired for the similar drug and the determination target side effect phrase is evaluated, and the result of the evaluation and the first threshold value To determine whether the side effect indicated by the determination target side effect phrase is a known side effect in the related drug,
Outputs a determination result as to whether the side effect indicated by the determination target side effect phrase is a known side effect,
In order to evaluate the combination of the phrase included in the comparison target phrase set and the determination target side effect phrase, as a phrase similarity evaluation process,
When the partial character strings of the third length obtained by adding the first length and the second length match the similarity between the partial character strings obtained by dividing each of the two phrases respectively. The evaluation is performed so as to give a higher evaluation than the evaluation obtained by adding the evaluation when the partial character strings of the first length match and the evaluation when the partial character strings of the second length match. ,
Evaluating the combination of division patterns that divide each of the two phrases into one or more partial character strings by counting the similarities evaluated for the partial character strings,
A method including: evaluating a similarity between the two words / phrases using an evaluation of a plurality of combinations of division patterns of the two words / phrases.
(Appendix 12)
The computer further comprises:
Reading the attached document of the determination target drug from the storage unit, performing a phrase extraction process on the effect or effect description part in the attached document to obtain a first efficacy effect phrase set,
For each of the plurality of other medicines, a package insert of the drug is read from the storage means, and a phrase extraction process is performed on a description of the effect or effect in the package insert to obtain a second efficacy effect phrase related to the other drug Get a set,
For at least some of the other pharmaceutical products, respectively
Evaluating a combination of a phrase included in the second efficacy effect phrase set acquired for the other pharmaceutical product and a phrase included in the first efficacy effect phrase set by the phrase similarity evaluation process,
By calculating the evaluation for a plurality of combinations, a value indicating the efficacy or similarity of the effect between the determination target drug and the other drug is calculated,
The calculated value is compared with a second threshold, and when the similarity indicated by the calculated value is higher than the similarity indicated by the second threshold, the other drug is determined as an analog of the determination target drug And performing a first medicine determination process including updating the medicine learning result information on the storage means so as to associate the identification information of the medicine to be judged with the other medicine.
Recognizing the similar drug of the determination target drug according to the determination result of the first similar drug determination process,
The determination method according to supplementary note 11, characterized by:
(Appendix 13)
The computer is
Among the plurality of other medicinal products, a medicinal property classification name, a reference name, a common name, a chemical name or a structural formula described in the package insert are determined to be a similar drug, and the medicinal product is determined as the similar drug, Further executing a second similar drug determination process for updating the similar drug learning result information on the storage means so as to associate the drug determined as the similar drug with the identification information of the determination target drug,
For the drug that has not been determined as the similar drug by the second similar drug determination process among the plurality of other drugs, the first similar drug determination process is performed,
13. The determination method according to appendix 12, wherein the similar drug of the determination target drug is recognized according to both determination results of the second similar drug determination process and the first similar drug determination process.
(Appendix 14)
The phrase similarity evaluation process totals evaluations for all combinations of division patterns that can be divided into arbitrary two characters adjacent to each other in each of the two phrases. The determination method according to any one of appendices 11 to 13, further comprising: calculating the similarity between the two words.
(Appendix 15)
The phrase similarity evaluation processing includes obtaining the similarity between the two phrases using only a division pattern that divides each of the two phrases at a morpheme segmentation position. The determination method according to any one of the above.

１００判定装置
１０１格納部
１０２副作用処理部
１０３類薬処理部
１０４判定対象薬指定部
１０５類薬認識部
１０６報告文書取得部
１０７副作用キーワード抽出部
１０８副作用判定・学習部
１０９キーワード類似度評価部
１１０部分文字列類似度評価部
１１１効能・効果キーワード抽出部
１１２類薬判定・学習部
１１３前処理制御部
２０１添付文書群
２０２添付文書
２０３同義語辞書
２０４学習結果テーブル
２０５安全性情報報告文書
２０６判定対象薬ＩＤ
３００コンピュータ
３０１ＣＰＵ
３０２ＲＯＭ
３０３ＲＡＭ
３０４通信インタフェース
３０５入力装置
３０６出力装置
３０７記憶装置
３０８駆動装置
３０９バス
３１０可搬型記憶媒体
３１１ネットワーク
３１２他のコンピュータ
４００副作用判定結果画面
４０１判定対象薬表示欄
４０２判定結果一覧表
４０３ラジオボタン
４０４学習ボタン
５０１配点情報
５０２基準値情報
６０１、６０２キーワード
６０３ａ〜６０３ｃ、６０４ａ〜６０４ｃ分割パターン DESCRIPTION OF SYMBOLS 100 Determination apparatus 101 Storage part 102 Side effect processing part 103 Similar drug processing part 104 Determination target medicine designation | designated part 105 Similar drug recognition part 106 Report document acquisition part 107 Side effect keyword extraction part 108 Side effect determination / learning part 109 Keyword similarity evaluation part 110 part Character string similarity evaluation unit 111 Efficacy / effect keyword extraction unit 112 Similarity determination / learning unit 113 Preprocessing control unit 201 Attached document group 202 Attached document 203 Synonym dictionary 204 Learning result table 205 Safety information report document 206 Drug to be determined ID
300 Computer 301 CPU
302 ROM
303 RAM
304 Communication Interface 305 Input Device 306 Output Device 307 Storage Device 308 Drive Device 309 Bus 310 Portable Storage Medium 311 Network 312 Other Computer 400 Side Effect Judgment Result Screen 401 Judgment Target Drug Display Field 402 Judgment Result List 403 Radio Button 404 Learning Button 501 Scoring information 502 Reference value information 601 and 602 Keywords 603a to 603c, 604a to 604c Division pattern

Claims

A means for receiving information for identifying a drug, and identifying the drug indicated by the information as a determination target drug;
Report document acquisition means for acquiring a report document describing the side effects of the determination target drug;
By reading from the storage means the identification information that uniquely identifies the drug and the similar drug learning result information from the storage means, or for each of a plurality of drugs, the identification information of the drug and the side effects of the drug And a medicinal product recognition means for recognizing a similar drug to the determination target drug among a plurality of other medicinal products by reading out a package insert including the efficacy or effect of the medicinal product from the storage unit,
To evaluate the similarity between partial character strings included in a phrase, the first length and the second length are added to each other when the first and second partial character strings match. A partial character string similarity evaluation unit that gives a higher evaluation than the evaluation obtained by adding the evaluation when the partial character strings having the same length match each other and the evaluation when the partial character strings having the second length match. ,
By allowing the character string similarity evaluation means to evaluate the similarity between partial character strings obtained by dividing each of the two phrases, and summing up the evaluation results by the character string similarity evaluation means, the 2 Evaluate a combination of division patterns that divide each of the two phrases into one or more partial character strings, and use the evaluation of a plurality of combinations of the division patterns of each of the two phrases to determine the similarity between the two phrases A word similarity evaluation means for evaluating
From the report document, a side effect phrase extracting means for extracting a phrase indicating the side effect of the determination target drug as a determination target side effect phrase;
By reading out a package insert of a medicine recognized as the drug by the drug recognizing means from the storage means and extracting a set of words by a word extraction process from a side effect description part in the package attached, or The medicinal product recognized from the storage means as the medicinal product by a set of words and phrases obtained by the word extraction process from the side effect description part in the package insert of the medicinal product recognized by the medicinal product recognition unit as the medicinal product By reading out the side effect learning result information associated with the identification information, the set of words / phrases included in the description part of the side effect in the package insert of the drug recognized as the similar drug is acquired as a comparison target word / phrase set. A comparison target set acquisition means;
For at least a part of the analogy drug, the phrase similarity evaluation unit evaluates the combination of the word / phrase included in the comparison target word / phrase acquired for the analogy drug and the judgment target side effect word / phrase. A determination means for determining whether or not the side effect indicated by the determination target side effect phrase is a known side effect in the related drug, using the result and the first threshold;
An information processing apparatus comprising: output means for outputting a determination result by the determination means.

The attached document of the determination target drug is read from the storage unit, and a phrase extraction process is performed on the indication of the effect or effect in the attached document to obtain a first effect-effect phrase set, and the plurality of other medicines The efficacy and effect of reading the package insert of the drug for each from the storage means and performing the phrase extraction process on the indication of the effect or effect in the package insert to obtain the second set of effect and effect phrases related to the other drug Word extraction means;
For at least some of the other pharmaceutical products, respectively
The phrase similarity evaluation means evaluates a combination of a phrase included in the second efficacy effect phrase set acquired for the other pharmaceutical product and a phrase included in the first efficacy effect phrase set,
By calculating the evaluation for a plurality of combinations, a value indicating the efficacy or similarity of the effect between the determination target drug and the other drug is calculated,
Comparing the calculated value with a second threshold;
When the similarity indicated by the calculated value is higher than the similarity indicated by the second threshold, the other medicine is determined as an analog of the determination target drug, and the identification information of the determination target drug and the other First analog medicine determination means for updating the analog medicine learning result information on the storage means so as to associate the medicines of
Further comprising
The information processing apparatus according to claim 1, wherein the similar drug recognition unit recognizes the similar drug of the determination target drug according to a determination result of the first similar drug determination unit.

Among the plurality of other medicinal products, a medicinal property classification name, a reference name, a common name, a chemical name or a structural formula described in the package insert are determined to be a similar drug, and the medicinal product is determined as the similar drug, A second similar medicine determination means for updating the similar medicine learning result information on the storage means so as to associate the medicine determined as the similar medicine with the identification information of the determination target medicine;
The first similar drug determination means determines whether or not it is the similar drug for a drug that has not been determined as the similar drug by the second similar drug determination means among the plurality of other drug drugs,
The said analogy drug recognition means recognizes the said analogy drug of the said determination object drug according to the determination result of both said 2nd analogy drug determination means and said 1st analogy drug determination means. The information processing apparatus described.

On the computer,
Accepts information to identify medicines,
The drug indicated by the information is identified as a determination target drug,
Obtain a report document describing the side effects of the determination target drug,
By reading from the storage means the identification information that uniquely identifies the drug and the similar drug learning result information from the storage means, or for each of a plurality of drugs, the identification information of the drug and the side effects of the drug And a medicinal product similar to the determination target drug among a plurality of other medicinal products by reading out the package insert including the efficacy or effect of the medicinal product from the storage unit,
From the report document, the phrase indicating the side effect of the determination target drug is extracted as a determination target side effect phrase,
By reading out a package insert of a medicine recognized as the similar drug from the storage means, and extracting a set of phrases from the side effect description part of the package insert by a phrase extraction process, or from the storage section, the class By reading out the side effect learning result information that associates the set of phrases obtained by the phrase extraction process from the description part of the side effects in the package insert of the drug recognized as a drug with the identification information of the drug recognized as the similar drug, A set of phrases included in the description part of the side effect in the package insert of the drug recognized as the similar drug is obtained as a set of phrases to be compared;
For at least a part of the similar drugs, the combination of the phrase included in the comparison target phrase set acquired with respect to the similar drug and the judgment target side effect phrase is evaluated, and the evaluation result and the threshold value are used. Determining whether the side effect indicated by the determination side effect phrase is a known side effect in the related drug,
Executing a side effect determination process including outputting a determination result of whether or not the side effect indicated by the determination target side effect phrase is a known side effect;
In order to evaluate the combination of the phrase included in the comparison target phrase set and the determination target side effect phrase, as a phrase similarity evaluation process,
When the partial character strings of the third length obtained by adding the first length and the second length match the similarity between the partial character strings obtained by dividing each of the two phrases respectively. The evaluation is performed so as to give a higher evaluation than the evaluation obtained by adding the evaluation when the partial character strings of the first length match and the evaluation when the partial character strings of the second length match. ,
Evaluating the combination of division patterns that divide each of the two phrases into one or more partial character strings by counting the similarities evaluated for the partial character strings,
The determination program which performs the process including evaluating the similarity of two said phrases using evaluation about multiple combinations of the division | segmentation pattern of each said two phrases.

Computer
Accepts information to identify medicines,
The drug indicated by the information is identified as a determination target drug,
Obtain a report document describing the side effects of the determination target drug,
By reading from the storage means the identification information that uniquely identifies the drug and the similar drug learning result information from the storage means, or for each of a plurality of drugs, the identification information of the drug and the side effects of the drug And a medicinal product similar to the determination target drug among a plurality of other medicinal products by reading out the package insert including the efficacy or effect of the medicinal product from the storage unit,
From the report document, the phrase indicating the side effect of the determination target drug is extracted as a determination target side effect phrase,
By reading out a package insert of a medicine recognized as the similar drug from the storage means, and extracting a set of phrases from the side effect description part of the package insert by a phrase extraction process, or from the storage section, the class By reading out the side effect learning result information that associates the set of phrases obtained by the phrase extraction process from the description part of the side effects in the package insert of the drug recognized as a drug with the identification information of the drug recognized as the similar drug, A set of phrases included in the description part of the side effect in the package insert of the drug recognized as the similar drug is obtained as a set of phrases to be compared;
For at least a part of the similar drugs, the combination of the phrase included in the comparison target phrase set acquired with respect to the similar drug and the judgment target side effect phrase is evaluated, and the evaluation result and the threshold value are used. Determining whether the side effect indicated by the determination side effect phrase is a known side effect in the related drug,
Outputs a determination result as to whether the side effect indicated by the determination target side effect phrase is a known side effect,
In order to evaluate the combination of the phrase included in the comparison target phrase set and the determination target side effect phrase, as a phrase similarity evaluation process,
When the partial character strings of the third length obtained by adding the first length and the second length match the similarity between the partial character strings obtained by dividing each of the two phrases respectively. The evaluation is performed so as to give a higher evaluation than the evaluation obtained by adding the evaluation when the partial character strings of the first length match and the evaluation when the partial character strings of the second length match. ,
Evaluating the combination of division patterns that divide each of the two phrases into one or more partial character strings by counting the similarities evaluated for the partial character strings,
A method including: evaluating a similarity between the two words / phrases using an evaluation of a plurality of combinations of division patterns of the two words / phrases.