JP2023013559A

JP2023013559A - Text data analysis system, text data analysis method, and computer program

Info

Publication number: JP2023013559A
Application number: JP2021117846A
Authority: JP
Inventors: 直上野; Sunao Ueno; 明哲北岡; Akisato Kitaoka; 香奈子安念; Kanako Annen
Original assignee: Ubicom Holdings Inc
Current assignee: Ubicom Holdings Inc
Priority date: 2021-07-16
Filing date: 2021-07-16
Publication date: 2023-01-26

Abstract

To provide a text data analysis system, or the like, configured to extract a character string indicating a medical activity, or the like, from text data, without adding more keywords to a master than is necessary.SOLUTION: A text data analysis system 1 includes: first search means 13 which searches for a corresponding character string, using a character string extracted from text data, as a first keyword; first character string generation means 14 which generates, when there is no corresponding character string, a character string by removing a prefix from the first keyword; second search means 15 which searches for a corresponding character string using the generated character string as a second keyword; second character string generation means 16 which generates, when there is no corresponding character string, a character string by removing at least a parenthesis and a parenthesized character string from the second keyword; similar character string extraction means 17 which extracts a character string including a third keyword, using the generated character string, as the third keyword; and output means 19 which outputs information on at least one medical activity corresponding to the extracted character string.SELECTED DRAWING: Figure 1

Description

本発明は、テキストデータ解析システム、テキストデータ解析方法およびコンピュータプログラムに関する。 The present invention relates to a text data analysis system, text data analysis method and computer program.

従来、光学文字読取装置で文書を読み取る文書読取装置が知られている（特許文献１）。この技術では、誤認識した文字を含む誤認識文字列と、誤認識した文字を修正する修正文字列とを対応して記憶する誤認識データベースを有し、光学文字読取装置で文書を読み取った読取データの文字列を誤認識データベースで検索し、誤認識文字列の場合は対応した修正文字列に変換した修正データを作成する。また、正しく修正されなかったものについては、誤認識データベースに追加していくことで誤認識の成功率を高くする。 Conventionally, there is known a document reader that reads a document with an optical character reader (Patent Document 1). This technology has an erroneous recognition database that stores erroneously recognized character strings including erroneously recognized characters and corrected character strings for correcting erroneously recognized characters in association with each other. The character string of the data is searched in the misrecognition database, and in the case of the misrecognition character string, correction data is created by converting it into a corresponding correction character string. In addition, the error recognition success rate is increased by adding to the error recognition database those that have not been correctly corrected.

特許第３３４９６９９号公報Japanese Patent No. 3349699

ところで、所定の文字列をキーワードとして、医療行為や医薬品、傷病名を表す文字列を記憶するマスタから、キーワードと一致する文字列がヒットするかを検索する場合において、診療明細書等に記載されていた文字列がマスタに記憶されている文字列と同一でなかったり、診療明細書等に記載されている文字列を正確にテキストデータ化できなかったりした場合、そのままでは医療行為等を表す文字列を抽出することはできない。 By the way, when a character string matching a keyword is searched from a master storing character strings representing medical practices, medicines, and names of injuries and illnesses using a predetermined character string as a keyword, it is necessary to search for a character string that matches the keyword. If the character string stored in the master is not the same as the character string stored in the master, or if the character string described in the medical bill, etc. cannot be converted into text data accurately, the character string that represents the medical practice, etc. as it is Columns cannot be extracted.

これに対して、従来、マスタに、キーワードと一致する文字列が記憶されていない場合、当該キーワードをマスタに追加していくことで、検索の成功率を高くする技術もあるが、マスタに新たなキーワードを追加する前は文字列を抽出することができないし、マスタに新たなキーワードを追加していくことでマスタのデータ容量が大きくなっていくという問題がある。 On the other hand, conventionally, when a character string that matches a keyword is not stored in the master, there is a technique for increasing the search success rate by adding the keyword to the master. Character strings cannot be extracted before new keywords are added, and adding new keywords to the master increases the data capacity of the master.

本発明は、以上の背景に鑑みてなされたものであり、マスタにキーワードを必要以上に追加していくことなく、テキストデータから医療行為や医薬品、傷病名を表す文字列を抽出することができるテキストデータ解析システム、テキストデータ解析方法およびコンピュータプログラムを提供することを目的とする。 The present invention has been made in view of the above background, and can extract character strings representing medical practices, medicines, and disease names from text data without adding keywords more than necessary to the master. An object of the present invention is to provide a text data analysis system, a text data analysis method, and a computer program.

前記した目的を達成するためのテキストデータ解析システムは、テキストデータを取得するデータ取得手段と、データ取得手段が取得したテキストデータから、一の項目を表す一群の文字列を抽出する文字列抽出手段と、前記文字列抽出手段が抽出した文字列を第１キーワードとして、医療行為または医薬品を表す文字列を記憶する医療行為・医薬品マスタから、前記第１キーワードと一致する文字列がヒットするかを検索する第１検索手段であって、検索の結果、前記第１キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する医療行為または医薬品の情報を出力する第１検索手段と、前記第１検索手段による検索の結果、前記第１キーワードと一致する文字列がヒットしない場合、文字列の先頭に付く特定の文字列である接頭語を記憶する接頭語マスタを参照して、前記第１キーワードから、接頭語を取り除いた文字列を生成する第１文字列生成手段と、前記第１文字列生成手段が生成した文字列を第２キーワードとして、前記医療行為・医薬品マスタから、前記第２キーワードと一致する文字列がヒットするかを検索する第２検索手段であって、検索の結果、前記第２キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する医療行為または医薬品の情報を出力する第２検索手段と、前記第２検索手段による検索の結果、前記第２キーワードと一致する文字列がヒットしない場合、前記第２キーワードから、少なくとも、括弧および当該括弧によって囲われた文字列を取り除いた文字列を生成する第２文字列生成手段と、前記第２文字列生成手段が生成した文字列を第３キーワードとして、前記医療行為・医薬品マスタから、前記第３キーワードを含む文字列を抽出する類似文字列抽出手段と、前記類似文字列抽出手段が抽出した文字列に対応する少なくとも１つの医療行為または医薬品の情報を出力する出力手段と、を備えることを特徴とする。 A text data analysis system for achieving the above object comprises data acquisition means for acquiring text data, and character string extraction means for extracting a group of character strings representing one item from the text data acquired by the data acquisition means. Then, using the character string extracted by the character string extracting means as the first keyword, it is determined whether a character string matching the first keyword is hit from the medical practice/drug master storing character strings representing medical practices or medicines. a first search means for searching, which, when a character string matching the first keyword is hit as a result of the search, outputs information on a medical practice or drug corresponding to the hit character string; If the search by the first search means does not find a hit for a character string that matches the first keyword, the prefix master that stores a prefix that is a specific character string attached to the beginning of the character string is referred to. A first character string generating means for generating a character string by removing the prefix from the first keyword; and using the character string generated by the first character string generating means as a second keyword, the above A second search means for searching for a hit for a character string that matches a second keyword, and if a character string that matches the second keyword is hit as a result of the search, medical practice corresponding to the hit character string Or a second search means for outputting drug information, and if the search result by the second search means does not hit a character string that matches the second keyword, at least the parentheses and the parentheses from the second keyword second character string generating means for generating a character string by removing enclosed character strings; and using the character string generated by the second character string generating means as a third keyword, the third Characterized by comprising: similar character string extraction means for extracting a character string including a keyword; and output means for outputting at least one medical practice or drug information corresponding to the character string extracted by the similar character string extraction means. and

このようなシステムによれば、マスタにキーワードを必要以上に追加していくことなく、テキストデータから医療行為または医薬品を表す文字列を抽出することができる。 According to such a system, character strings representing medical practices or medicines can be extracted from text data without adding keywords more than necessary to the master.

また、テキストデータ解析システムは、前記類似文字列抽出手段が抽出した文字列が複数ある場合、前記類似文字列抽出手段が抽出した文字列と、前記第２キーワードとの類似度をそれぞれ算出する第１類似度算出手段をさらに備え、前記出力手段は、前記類似文字列抽出手段が抽出した文字列のうち、前記第１類似度算出手段が算出した類似度が所定以上である文字列に対応する医療行為または医薬品の情報を出力する構成とすることができる。 Further, when there are a plurality of character strings extracted by the similar character string extraction means, the text data analysis system calculates the degree of similarity between the character strings extracted by the similar character string extraction means and the second keyword. 1 similarity degree calculation means, wherein the output means corresponds to character strings having a degree of similarity calculated by the first similarity degree calculation means equal to or higher than a predetermined degree among the character strings extracted by the similar character string extraction means. It can be configured to output information on medical practices or pharmaceuticals.

これによれば、医療行為、医薬品の情報を絞り込んで出力することができる。 According to this, information on medical practices and medicines can be narrowed down and output.

また、前記医療行為・医薬品マスタは、一または複数の医療行為または医薬品を表す文字列と、当該一または複数の医療行為または医薬品を表す文字列に含まれる文字列である索引文字列とを対応させて記憶しており、テキストデータ解析システムは、前記類似文字列抽出手段が、前記医療行為・医薬品マスタから前記第３キーワードを含む文字列を抽出できなかった場合、前記第３キーワードと、前記索引文字列との類似度を算出する第２類似度算出手段をさらに備え、前記類似文字列抽出手段は、前記第２類似度算出手段が算出した類似度が所定以上である索引文字列がある場合、前記医療行為・医薬品マスタから、当該索引文字列に対応する文字列を抽出する構成とすることができる。 In the medical practice/drug master, a character string representing one or more medical practices or pharmaceuticals corresponds to an index string that is a character string included in the character strings representing the one or more medical practices or pharmaceuticals. When the similar character string extraction means fails to extract a character string containing the third keyword from the medical practice/drug master, the text data analysis system stores the third keyword and the Further comprising second similarity calculation means for calculating a similarity with an index character string, the similar character string extraction means having an index character string whose similarity calculated by the second similarity calculation means is equal to or greater than a predetermined value In this case, a character string corresponding to the index character string can be extracted from the medical practice/drug master.

これによれば、テキストデータから医療行為または医薬品を表す文字列をより確実に抽出することができる。 According to this, it is possible to more reliably extract a character string representing a medical practice or medicine from text data.

また、テキストデータ解析システムは、前記第２類似度算出手段が算出した類似度が所定以上である索引文字列がない場合、医療行為または医薬品を表す文字列に含まれる特定の文字列である要素文字列を記憶する要素文字列マスタを参照して、前記第２キーワードから、少なくとも１つの要素文字列を抽出する要素文字列抽出手段をさらに備え、前記類似文字列抽出手段は、前記要素文字列抽出手段が抽出した要素文字列を第４キーワードとして、前記医療行為・医薬品マスタから、前記第４キーワードを含む文字列を抽出する構成とすることができる。 In addition, if there is no index character string with a similarity calculated by the second similarity calculation means equal to or higher than a predetermined value, the text data analysis system detects an element that is a specific character string included in the character string representing medical practice or medicine. An element string extraction means for extracting at least one element string from the second keyword with reference to an element string master storing character strings, wherein the similar string extraction means extracts the element string The element character string extracted by the extracting means may be used as a fourth keyword, and a character string including the fourth keyword may be extracted from the medical practice/drug master.

これによれば、テキストデータから医療行為または医薬品を表す文字列をさらに確実に抽出することができる。 According to this, it is possible to more reliably extract a character string representing a medical practice or medicine from text data.

また、テキストデータ解析システムは、前記要素文字列抽出手段が抽出した要素文字列が複数ある場合、前記第２キーワードの中央に近い位置にある要素文字列を第４キーワードとして前記類似文字列抽出手段が抽出した文字列について、他の要素文字列よりも先に前記第２キーワードとの類似度を算出する第３類似度算出手段をさらに備え、前記出力手段は、前記第３類似度算出手段が先に類似度を算出した文字列の中に類似度が所定以上である文字列がある場合、当該文字列に対応する医療行為または医薬品の情報を出力する構成とすることができる。 Further, in the text data analysis system, when there are a plurality of element character strings extracted by the element character string extraction means, the similar character string extraction means uses an element character string located near the center of the second keyword as a fourth keyword. further comprises a third similarity calculating means for calculating a similarity between the character string extracted by the second keyword and the second keyword before other element character strings are calculated; If there is a character string whose degree of similarity is equal to or higher than a predetermined degree among the character strings whose degree of similarity has been calculated in advance, information on the medical practice or medicine corresponding to the character string can be output.

これによれば、医療行為または医薬品の情報を出力するまでの処理量を少なくして処理速度を速くすることができる。 According to this, it is possible to reduce the processing amount until outputting the information of the medical practice or medicine, and increase the processing speed.

また、前記第２文字列生成手段は、前記第２キーワードから、以下の文字列（１）～（５）の少なくとも１つをさらに取り除いた文字列を生成する構成とすることができる。
（１）先頭または後尾にある空白
（２）途中にある空白および当該空白以降の文字列
（３）中黒
（４）読点
（５）数字および当該数字の直後にある単位を表す文字列 Further, the second character string generating means may generate a character string by removing at least one of the following character strings (1) to (5) from the second keyword.
(1) Blanks at the beginning or end (2) Blanks in the middle and the character string after the blank (3) Middle black (4) Comma marks (5) Numbers and character strings immediately after the numbers that represent units

これによれば、医療行為、医薬品の情報を絞り込みやすくすることができる。 According to this, information on medical practices and medicines can be easily narrowed down.

また、前記した目的を達成するためのテキストデータ解析システムは、テキストデータを取得するデータ取得手段と、データ取得手段が取得したテキストデータから、一の項目を表す一群の文字列を抽出する文字列抽出手段と、前記文字列抽出手段が抽出した文字列を第１キーワードとして、傷病名を表す文字列を記憶する傷病名マスタから、前記第１キーワードと一致する文字列がヒットするかを検索する第１検索手段であって、検索の結果、前記第１キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する第１検索手段と、前記第１検索手段による検索の結果、前記第１キーワードと一致する文字列がヒットしない場合、文字列の後尾に付く特定の文字列である接尾語を記憶する接尾語マスタを参照して、前記第１キーワードから、接尾語を取り除いた文字列を生成する第１文字列生成手段と、前記第１文字列生成手段が生成した文字列を第２キーワードとして、前記傷病名マスタから、前記第２キーワードと一致する文字列がヒットするかを検索する第２検索手段であって、検索の結果、前記第２キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する第２検索手段と、前記第２検索手段による検索の結果、前記第２キーワードと一致する文字列がヒットしない場合、文字列の先頭に付く特定の文字列である接頭語を記憶する接頭語マスタを参照して、前記第２キーワードから、接頭語を取り除いた文字列を生成する第２文字列生成手段と、前記第２文字列生成手段が生成した文字列を第３キーワードとして、前記傷病名マスタから、前記第３キーワードと一致する文字列がヒットするかを検索する第３検索手段であって、検索の結果、前記第３キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する第３検索手段と、を備えることを特徴とする。 Further, the text data analysis system for achieving the above object comprises data acquisition means for acquiring text data, and character strings for extracting a group of character strings representing one item from the text data acquired by the data acquisition means. extracting means, and using the character string extracted by the character string extracting means as a first keyword, searching an injury or disease name master storing character strings representing injury or disease names for a character string matching the first keyword. a first search means for outputting information on an injury or disease name corresponding to the hit character string when a character string matching the first keyword is hit as a result of the search; If a character string that matches the first keyword is not hit as a result of searching by means, a suffix master that stores a suffix that is a specific character string attached to the end of the character string is referred to, and from the first keyword , a first character string generating means for generating a character string with the suffix removed; and the character string generated by the first character string generating means as a second keyword, from the disease name master, matching the second keyword. A second search means for searching for a hit for a character string, and for outputting information on an injury or disease name corresponding to the hit character string when a character string matching the second keyword is hit as a result of the search. 2 search means, and a prefix master that stores a prefix that is a specific character string attached to the head of a character string when a character string that matches the second keyword is not hit as a result of the search by the second search means; a second character string generating means for generating a character string by removing the prefix from the second keyword; and a character string generated by the second character string generating means as a third keyword; is a third search means for searching whether a character string matching the third keyword is hit from, and if the character string matching the third keyword is hit as a result of the search, it corresponds to the hit character string and a third search means for outputting information on the name of injury or illness to be performed.

このようなシステムによれば、マスタにキーワードを必要以上に追加していくことなく、テキストデータから傷病名を表す文字列を抽出することができる。 According to such a system, it is possible to extract a character string representing an injury or disease name from text data without adding keywords more than necessary to the master.

また、テキストデータ解析システムは、前記第３検索手段による検索の結果、前記第３キーワードと一致する文字列がヒットしない場合、傷病名に含まれる特定の文字列である要素文字列を記憶する要素文字列マスタを参照して、前記第３キーワードから、少なくとも１つの要素文字列を抽出する要素文字列抽出手段と、前記要素文字列抽出手段が抽出した要素文字列を第４キーワードとして、前記傷病名マスタから、前記第４キーワードを含む文字列を抽出する類似文字列抽出手段と、前記類似文字列抽出手段が抽出した文字列に対応する少なくとも１つの傷病名の情報を出力する出力手段と、をさらに備える構成とすることができる。 In addition, the text data analysis system stores an element character string that is a specific character string included in the name of an injury or disease when a character string that matches the third keyword is not hit as a result of the search by the third search means. Element character string extracting means for extracting at least one element character string from the third keyword with reference to the character string master; Similar character string extraction means for extracting a character string containing the fourth keyword from the first name master; Output means for outputting information on at least one disease name corresponding to the character string extracted by the similar character string extraction means; It can be configured to further include.

これによれば、テキストデータから傷病名を表す文字列をより確実に抽出することができる。 According to this, it is possible to more reliably extract the character string representing the disease name from the text data.

また、テキストデータ解析システムは、前記類似文字列抽出手段が抽出した文字列と、前記第３キーワードとの類似度を算出する類似度算出手段をさらに備え、前記出力手段は、前記類似文字列抽出手段が抽出した文字列のうち、前記類似度算出手段が算出した類似度が所定以上である文字列に対応する傷病名の情報を出力する構成とすることができる。 Further, the text data analysis system further comprises similarity calculation means for calculating a similarity between the character string extracted by the similar character string extraction means and the third keyword, and the output means extracts the similar character string It can be configured to output information of disease name corresponding to a character string having a similarity calculated by the similarity calculation means equal to or higher than a predetermined value among the character strings extracted by the means.

これによれば、傷病名の情報を絞り込んで出力することができる。 According to this, it is possible to narrow down and output the information on the disease name.

また、前記類似度算出手段は、前記要素文字列抽出手段が抽出した要素文字列が複数ある場合、前記第３キーワードの中央に近い位置にある要素文字列を第４キーワードとして前記類似文字列抽出手段が抽出した文字列について、他の要素文字列よりも先に前記第３キーワードとの類似度を算出し、前記出力手段は、先に類似度を算出した文字列の中に類似度が所定以上である文字列がある場合、当該文字列に対応する傷病名の情報を出力する構成とすることができる。 Further, when there are a plurality of element character strings extracted by the element character string extraction means, the similarity calculation means extracts the similar character string as a fourth keyword, which is an element character string located near the center of the third keyword. means for calculating the degree of similarity between the character string extracted by the means and the third keyword before calculating the degree of similarity with the third keyword before calculating the degree of similarity among the character strings whose degree of similarity is calculated first; When there is a character string that is the above, it is possible to configure to output the information of the disease name corresponding to the character string.

これによれば、傷病名の情報を出力するまでの処理量を少なくして処理速度を速くすることができる。 According to this, it is possible to reduce the amount of processing until the information on the disease name is output and increase the processing speed.

また、前記した目的を達成するためのテキストデータ解析方法は、コンピュータが備える手段が、テキストデータを取得するデータ取得ステップと、データ取得ステップで取得したテキストデータから、一の項目を表す一群の文字列を抽出する文字列抽出ステップと、前記文字列抽出ステップで抽出した文字列を第１キーワードとして、医療行為または医薬品を表す文字列を記憶する医療行為・医薬品マスタから、前記第１キーワードと一致する文字列がヒットするかを検索する第１検索ステップであって、検索の結果、前記第１キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する医療行為または医薬品の情報を出力する第１検索ステップと、前記第１検索ステップにおける検索の結果、前記第１キーワードと一致する文字列がヒットしない場合、文字列の先頭に付く特定の文字列である接頭語を記憶する接頭語マスタを参照して、前記第１キーワードから、接頭語を取り除いた文字列を生成する第１文字列生成ステップと、前記第１文字列生成ステップで生成した文字列を第２キーワードとして、前記医療行為・医薬品マスタから、前記第２キーワードと一致する文字列がヒットするかを検索する第２検索ステップであって、検索の結果、前記第２キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する医療行為または医薬品の情報を出力する第２検索ステップと、前記第２検索ステップにおける検索の結果、前記第２キーワードと一致する文字列がヒットしない場合、前記第２キーワードから、少なくとも、括弧および当該括弧によって囲われた文字列を取り除いた文字列を生成する第２文字列生成ステップと、前記第２文字列生成ステップで生成した文字列を第３キーワードとして、前記医療行為・医薬品マスタから、前記第３キーワードを含む文字列を抽出する類似文字列抽出ステップと、前記類似文字列抽出ステップで抽出した文字列に対応する少なくとも１つの医療行為または医薬品の情報を出力する出力ステップと、を実行することを特徴とする。 Further, a text data analysis method for achieving the above-described object is provided by a means provided in a computer comprising: a data acquisition step for acquiring text data; A character string extraction step of extracting a string, and using the character string extracted in the character string extraction step as a first keyword, a medical practice/drug master storing a character string representing a medical practice or drug matches the first keyword. A first search step of searching for a hit for a character string that matches the first keyword, and if a character string that matches the first keyword is hit as a result of the search, information on medical practices or pharmaceuticals corresponding to the hit character string is displayed. a first search step for outputting; and a prefix for storing a prefix, which is a specific character string attached to the head of a character string when a character string matching the first keyword is not hit as a result of the search in the first search step. a first character string generation step of generating a character string by removing the prefix from the first keyword with reference to the word master; A second search step of searching for a hit of a character string matching the second keyword from the medical practice/pharmaceutical master, and if the search results in a hit of a character string matching the second keyword, hit a second search step of outputting information on medical practices or pharmaceuticals corresponding to the character string obtained; , a second character string generating step of generating a character string by removing at least parentheses and character strings enclosed by the parentheses; A similar character string extraction step of extracting a character string containing the third keyword from the drug master, and an output of outputting at least one medical practice or drug information corresponding to the character string extracted in the similar character string extraction step It is characterized by performing steps and

このような方法によれば、マスタにキーワードを必要以上に追加していくことなく、テキストデータから医療行為または医薬品を表す文字列を抽出することができる。 According to such a method, it is possible to extract character strings representing medical practices or medicines from text data without adding keywords more than necessary to the master.

また、前記した目的を達成するためのテキストデータ解析方法は、コンピュータが備える手段が、テキストデータを取得するデータ取得ステップと、データ取得ステップで取得したテキストデータから、一の項目を表す一群の文字列を抽出する文字列抽出ステップと、前記文字列抽出ステップで抽出した文字列を第１キーワードとして、傷病名を表す文字列を記憶する傷病名マスタから、前記第１キーワードと一致する文字列がヒットするかを検索する第１検索ステップであって、検索の結果、前記第１キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する第１検索ステップと、前記第１検索ステップにおける検索の結果、前記第１キーワードと一致する文字列がヒットしない場合、文字列の後尾に付く特定の文字列である接尾語を記憶する接尾語マスタを参照して、前記第１キーワードから、接尾語を取り除いた文字列を生成する第１文字列生成ステップと、前記第１文字列生成ステップで生成した文字列を第２キーワードとして、前記傷病名マスタから、前記第２キーワードと一致する文字列がヒットするかを検索する第２検索ステップであって、検索の結果、前記第２キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する第２検索ステップと、前記第２検索ステップにおける検索の結果、前記第２キーワードと一致する文字列がヒットしない場合、文字列の先頭に付く特定の文字列である接頭語を記憶する接頭語マスタを参照して、前記第２キーワードから、接頭語を取り除いた文字列を生成する第２文字列生成ステップと、前記第２文字列生成ステップで生成した文字列を第３キーワードとして、前記傷病名マスタから、前記第３キーワードと一致する文字列がヒットするかを検索する第３検索ステップであって、検索の結果、前記第３キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する第３検索ステップと、を実行することを特徴とする。 Further, a text data analysis method for achieving the above-described object is provided by a means provided in a computer comprising: a data acquisition step for acquiring text data; A character string extraction step of extracting a sequence, and using the character string extracted in the character string extraction step as a first keyword, a character string matching the first keyword is extracted from an injury or disease name master storing character strings representing injury or disease names. A first search step of searching for hits, wherein if a character string that matches the first keyword is hit as a result of the search, the first search step of outputting information on the disease name corresponding to the hit character string. and, if the search result in the first search step does not hit a character string that matches the first keyword, a suffix master that stores a suffix that is a specific character string attached to the end of the character string is referenced. , a first character string generating step of generating a character string by removing the suffix from the first keyword; and using the character string generated in the first character string generating step as a second keyword from the disease name master, A second search step of searching for a hit for a character string that matches the second keyword, and if a character string that matches the second keyword is hit as a result of the search, the disease name corresponding to the hit character string and a prefix, which is a specific character string attached to the beginning of the character string, if a character string that matches the second keyword is not hit as a result of the search in the second search step. a second character string generating step of generating a character string by removing the prefix from the second keyword with reference to the stored prefix master; and generating the character string generated in the second character string generating step as a third keyword as a third search step of searching whether a character string matching the third keyword is hit from the disease name master, and if a character string matching the third keyword is hit as a result of the search, and a third search step of outputting information on the disease name corresponding to the hit character string.

このような方法によれば、マスタにキーワードを必要以上に追加していくことなく、テキストデータから傷病名を表す文字列を抽出することができる。 According to such a method, it is possible to extract a character string representing an injury or disease name from text data without adding keywords more than necessary to the master.

また、前記した目的を達成するためのコンピュータプログラムは、コンピュータを、テキストデータを取得するデータ取得手段と、データ取得手段が取得したテキストデータから、一の項目を表す一群の文字列を抽出する文字列抽出手段と、前記文字列抽出手段が抽出した文字列を第１キーワードとして、医療行為または医薬品を表す文字列を記憶する医療行為・医薬品マスタから、前記第１キーワードと一致する文字列がヒットするかを検索する第１検索手段であって、検索の結果、前記第１キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する医療行為または医薬品の情報を出力する第１検索手段と、前記第１検索手段による検索の結果、前記第１キーワードと一致する文字列がヒットしない場合、文字列の先頭に付く特定の文字列である接頭語を記憶する接頭語マスタを参照して、前記第１キーワードから、接頭語を取り除いた文字列を生成する第１文字列生成手段と、前記第１文字列生成手段が生成した文字列を第２キーワードとして、前記医療行為・医薬品マスタから、前記第２キーワードと一致する文字列がヒットするかを検索する第２検索手段であって、検索の結果、前記第２キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する医療行為または医薬品の情報を出力する第２検索手段と、前記第２検索手段による検索の結果、前記第２キーワードと一致する文字列がヒットしない場合、前記第２キーワードから、少なくとも、括弧および当該括弧によって囲われた文字列を取り除いた文字列を生成する第２文字列生成手段と、前記第２文字列生成手段が生成した文字列を第３キーワードとして、前記医療行為・医薬品マスタから、前記第３キーワードを含む文字列を抽出する類似文字列抽出手段と、前記類似文字列抽出手段が抽出した文字列に対応する少なくとも１つの医療行為または医薬品の情報を出力する出力手段として機能させることを特徴とする。 Further, a computer program for achieving the above-described object comprises a computer comprising: data acquisition means for acquiring text data; character string for extracting a group of character strings representing one item from the text data acquired by the data acquisition means Using the character string extracted by the string extracting means and the character string extracting means as the first keyword, a character string matching the first keyword is hit from a medical practice/drug master storing character strings representing medical practices or medicines. a first search means for searching for whether to means, and when a character string that matches the first keyword is not hit as a result of the search by the first search means, a prefix master that stores a prefix that is a specific character string attached to the beginning of the character string is referred to. a first character string generation means for generating a character string by removing the prefix from the first keyword; is a second search means for searching whether a character string matching the second keyword is hit from the above, and if a character string matching the second keyword is hit as a result of the search, it corresponds to the hit character string a second search means for outputting information on medical practices or pharmaceuticals to be performed; a second character string generation means for generating a character string by removing the character strings enclosed by the parentheses; Functioning as similar character string extraction means for extracting a character string containing the third keyword, and output means for outputting information on at least one medical practice or drug corresponding to the character string extracted by the similar character string extraction means. characterized by

このようなプログラムによれば、マスタにキーワードを必要以上に追加していくことなく、テキストデータから医療行為または医薬品を表す文字列を抽出することができる。 According to such a program, character strings representing medical practices or medicines can be extracted from text data without adding keywords more than necessary to the master.

また、前記した目的を達成するためのコンピュータプログラムは、コンピュータを、テキストデータを取得するデータ取得手段と、データ取得手段が取得したテキストデータから、一の項目を表す一群の文字列を抽出する文字列抽出手段と、前記文字列抽出手段が抽出した文字列を第１キーワードとして、傷病名を表す文字列を記憶する傷病名マスタから、前記第１キーワードと一致する文字列がヒットするかを検索する第１検索手段であって、検索の結果、前記第１キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する第１検索手段と、前記第１検索手段による検索の結果、前記第１キーワードと一致する文字列がヒットしない場合、文字列の後尾に付く特定の文字列である接尾語を記憶する接尾語マスタを参照して、前記第１キーワードから、接尾語を取り除いた文字列を生成する第１文字列生成手段と、前記第１文字列生成手段が生成した文字列を第２キーワードとして、前記傷病名マスタから、前記第２キーワードと一致する文字列がヒットするかを検索する第２検索手段であって、検索の結果、前記第２キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する第２検索手段と、前記第２検索手段による検索の結果、前記第２キーワードと一致する文字列がヒットしない場合、文字列の先頭に付く特定の文字列である接頭語を記憶する接頭語マスタを参照して、前記第２キーワードから、接頭語を取り除いた文字列を生成する第２文字列生成手段と、前記第２文字列生成手段が生成した文字列を第３キーワードとして、前記傷病名マスタから、前記第３キーワードと一致する文字列がヒットするかを検索する第３検索手段であって、検索の結果、前記第３キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する第３検索手段として機能させることを特徴とする。 Further, a computer program for achieving the above-described object comprises a computer comprising: data acquisition means for acquiring text data; character string for extracting a group of character strings representing one item from the text data acquired by the data acquisition means Using the character string extracted by the string extracting means and the character string extracting means as a first keyword, searching for a character string matching the first keyword from an injury or disease name master storing character strings representing injury or disease names. a first search means for outputting information on an injury or disease name corresponding to the hit character string when a character string matching the first keyword is hit as a result of the search; If the search means does not find a hit for a character string that matches the first keyword, a suffix master that stores a suffix that is a specific character string attached to the end of the character string is referred to, and the first keyword is searched. a first character string generating means for generating a character string with the suffix removed from the list; and using the character string generated by the first character string generating means as a second keyword, matching with the second keyword a second search means for searching whether a character string matching the second keyword is hit, and if a character string matching the second keyword is hit as a result of the search, the information of the disease name corresponding to the hit character string is output. a second search means, and a prefix master for storing a prefix, which is a specific character string attached to the head of a character string when a character string matching the second keyword is not hit as a result of the search by the second search means. a second character string generation means for generating a character string by removing the prefix from the second keyword; A third search means for searching from the master for a hit for a character string that matches the third keyword, and if a character string that matches the third keyword is hit as a result of the search, the hit character string is It is characterized by functioning as a third search means for outputting information on the name of the corresponding injury or disease.

このようなプログラムによれば、マスタにキーワードを必要以上に追加していくことなく、テキストデータから傷病名を表す文字列を抽出することができる。 According to such a program, it is possible to extract a character string representing an injury or disease name from text data without adding keywords more than necessary to the master.

本発明によれば、マスタにキーワードを必要以上に追加していくことなく、テキストデータから医療行為や医薬品、傷病名を表す文字列を抽出することができる。 According to the present invention, it is possible to extract character strings representing medical practices, medicines, and disease names from text data without adding keywords more than necessary to the master.

第１実施形態に係るテキストデータ解析システムのブロック図である。1 is a block diagram of a text data analysis system according to a first embodiment; FIG. 第１医療行為・医薬品マスタを説明する図である。FIG. 4 is a diagram for explaining a first medical practice/medicine master; 第２医療行為・医薬品マスタを説明する図である。It is a figure explaining a 2nd medical practice/medicine master. 接頭語マスタを説明する図（ａ）と、要素文字列マスタを説明する図（ｂ）である。It is the figure (a) explaining a prefix master, and the figure (b) explaining an element string master. 第１実施形態のテキストデータ解析システムにおける処理の第１の例を説明する図である。It is a figure explaining the 1st example of the process in the text data analysis system of 1st Embodiment. 第１実施形態のテキストデータ解析システムにおける処理の第２の例を説明する図である。It is a figure explaining the 2nd example of the process in the text data analysis system of 1st Embodiment. 第１実施形態のテキストデータ解析システムにおける処理の第３の例を説明する図である。FIG. 10 is a diagram illustrating a third example of processing in the text data analysis system of the first embodiment; 第１実施形態のテキストデータ解析システムにおける処理の第３の例を説明する、図７に続く図である。FIG. 8 is a diagram following FIG. 7 for explaining a third example of processing in the text data analysis system of the first embodiment; 第１実施形態のテキストデータ解析システムの動作を説明するフローチャートである。4 is a flowchart for explaining the operation of the text data analysis system of the first embodiment; 第１実施形態のテキストデータ解析システムの動作を説明する、図９に続くフローチャートである。FIG. 10 is a flowchart continued from FIG. 9 for explaining the operation of the text data analysis system of the first embodiment; FIG. 第２実施形態に係るテキストデータ解析システムのブロック図である。It is a block diagram of a text data analysis system according to a second embodiment. 傷病名マスタを説明する図である。It is a figure explaining an injury or disease name master. 接尾語マスタを説明する図（ａ）と、接頭語マスタを説明する図（ｂ）と、要素文字列マスタを説明する図（ｃ）である。They are a diagram (a) for explaining a suffix master, a diagram (b) for explaining a prefix master, and a diagram (c) for explaining an element string master. 第２実施形態のテキストデータ解析システムにおける処理の第１の例を説明する図である。It is a figure explaining the 1st example of the process in the text data analysis system of 2nd Embodiment. 第２実施形態のテキストデータ解析システムにおける処理の第２の例を説明する図である。It is a figure explaining the 2nd example of the process in the text data analysis system of 2nd Embodiment. 第２実施形態のテキストデータ解析システムにおける処理の第２の例を説明する、図１５に続く図である。FIG. 16 is a diagram following FIG. 15 for explaining a second example of processing in the text data analysis system of the second embodiment; 第２実施形態のテキストデータ解析システムの動作を説明するフローチャートである。9 is a flowchart for explaining the operation of the text data analysis system of the second embodiment;

次に、第１実施形態について説明する。
図１に示すように、第１実施形態に係るテキストデータ解析システム１は、例えば、診断書、診療明細書、調剤明細書等に記載された項目に基づいて作成されたテキストデータから、医療行為に関する項目や医薬品に関する項目を厚生労働省が定めた基本マスタに収載された形式に変換して抽出するシステムである。テキストデータ解析システム１は、データ取得手段１１と、文字列抽出手段１２と、第１検索手段１３と、第１文字列生成手段１４と、第２検索手段１５と、第２文字列生成手段１６と、類似文字列抽出手段１７と、第１類似度算出手段１８と、出力手段１９と、第２類似度算出手段２０と、要素文字列抽出手段２１と、第３類似度算出手段２２とを備える。 Next, a first embodiment will be described.
As shown in FIG. 1, a text data analysis system 1 according to the first embodiment extracts medical practice from text data created based on items described in, for example, a medical certificate, a medical specification, a prescription specification, and the like. It is a system that extracts items related to drugs and drugs by converting them into the format listed in the basic master established by the Ministry of Health, Labor and Welfare. The text data analysis system 1 includes data acquisition means 11, character string extraction means 12, first search means 13, first character string generation means 14, second search means 15, and second character string generation means 16. , similar character string extraction means 17, first similarity calculation means 18, output means 19, second similarity calculation means 20, element character string extraction means 21, and third similarity calculation means 22. Prepare.

テキストデータ解析システム１は、図示しないＣＰＵ、ＲＡＭ、ＲＯＭ等と、記憶装置９０とを備えるコンピュータからなる。テキストデータ解析システム１は、ＲＯＭや記憶装置９０に記憶させておいたコンピュータプログラムをＲＡＭに読み込んで実行することで各手段を実現する。言い換えると、コンピュータプログラムは、テキストデータ解析システム１を構成するコンピュータを、データ取得手段１１と、文字列抽出手段１２と、第１検索手段１３と、第１文字列生成手段１４と、第２検索手段１５と、第２文字列生成手段１６と、類似文字列抽出手段１７と、第１類似度算出手段１８と、出力手段１９と、第２類似度算出手段２０と、要素文字列抽出手段２１と、第３類似度算出手段２２として機能させる。 The text data analysis system 1 comprises a computer having a CPU, RAM, ROM, etc. (not shown) and a storage device 90 . The text data analysis system 1 realizes each means by reading a computer program stored in the ROM or the storage device 90 into the RAM and executing the program. In other words, the computer program causes the computer constituting the text data analysis system 1 to perform data acquisition means 11, character string extraction means 12, first search means 13, first character string generation means 14, and second search. means 15, second character string generation means 16, similar character string extraction means 17, first similarity calculation means 18, output means 19, second similarity calculation means 20, element character string extraction means 21 , and functions as the third similarity calculation means 22 .

記憶装置９０には、医療行為・医薬品マスタと、接頭語マスタと、要素文字列マスタとが記憶されている。
医療行為・医薬品マスタは、医療行為または医薬品を表す文字列を記憶する。ここで、医療行為は、医科診療行為マスタに収載された診療行為、歯科診療行為マスタに収載された診療行為、および、調剤行為マスタに収載された調剤行為の少なくとも１つを含む。また、医薬品は、医薬品マスタに収載された医薬品、および、特定器材マスタに収載された特定器材の少なくとも１つを含む。医科診療行為マスタ、歯科診療行為マスタ、調剤行為マスタ、医薬品マスタおよび特定器材マスタとは、厚生労働省が定めた基本マスタである。 The storage device 90 stores a medical practice/medicine master, a prefix master, and an element character string master.
The medical practice/medicine master stores character strings representing medical practices or medicines. Here, the medical practice includes at least one of a medical practice listed in the medical practice master, a medical practice listed in the dental practice master, and a dispensing practice listed in the dispensing practice master. In addition, medicines include at least one of medicines listed in the medicine master and specific equipment listed in the specific equipment master. The medical practice master, the dental practice master, the dispensing practice master, the medicine master, and the specific equipment master are basic masters defined by the Ministry of Health, Labor and Welfare.

第１実施形態では、医療行為として、医科診療行為マスタに収載された診療行為を例示し、医薬品として、医薬品マスタに収載された医薬品を例示する。 In the first embodiment, the medical practice listed in the medical practice master is exemplified as the medical practice, and the drug listed in the drug master is exemplified as the drug.

医療行為・医薬品マスタは、第１医療行為・医薬品マスタと、第２医療行為・医薬品マスタとを含む。 The medical practice/drug master includes a first medical practice/drug master and a second medical practice/drug master.

図２に示すように、第１医療行為・医薬品マスタは、医科診療行為マスタまたは診療報酬点数表に収載された診療行為（省略漢字名称および基本漢字名称）を表す文字列と点数表区分番号とを対応させて記憶するとともに、医薬品マスタに収載された医薬品（漢字名称および基本漢字名称）を表す文字列と薬価基準コードとを対応させて記憶するテーブルとして構成されている。 As shown in FIG. 2, the first medical practice/pharmaceutical master includes a character string representing a medical practice (abbreviated Kanji name and basic Kanji name) listed in a medical practice practice master or a medical fee score table, and a score table classification number. are stored in correspondence with each other, and character strings representing drugs (names in kanji characters and basic kanji names) listed in the drug master are stored in association with drug price standard codes.

図３に示すように、第２医療行為・医薬品マスタは、類似文字列抽出手段１７が後述する第３キーワードを含む文字列を抽出しやすくするために用意されたデータであり、一または複数の医療行為または医薬品を表す文字列と、当該一または複数の医療行為または医薬品を表す文字列に含まれる文字列である索引文字列とを対応させて記憶している。詳しくは、第２医療行為・医薬品マスタは、第３キーワードを含む文字列を抽出しやすくするために用意された索引文字列、例えば、「食道切除再建術」、「内視鏡的大腸ポリープ粘膜切除術」、「水晶体再建術」等の索引文字列と、索引文字列を含む医療行為または医薬品を表す文字列に対応する少なくとも１つの点数表区分番号または薬価基準コードとを対応させて記憶するテーブルとして構成されている。 As shown in FIG. 3, the second medical practice/pharmaceutical master is data prepared so that the similar character string extraction means 17 can easily extract a character string containing a third keyword, which will be described later. A character string representing a medical practice or medicine is stored in association with an index character string that is a character string included in the one or more character strings representing the medical practice or medicine. Specifically, the second medical practice/pharmaceutical master contains index character strings prepared to facilitate the extraction of character strings containing the third keyword, such as “esophagectomy reconstruction”, “endoscopic colon polyp mucosa An index character string such as "excision surgery" and "lens reconstruction surgery" is associated with and stored with at least one score table division number or drug price standard code corresponding to the character string representing medical practice or medicine including the index character string. configured as a table.

図４（ａ）に示すように、接頭語マスタは、文字列の先頭に付く特定の文字列である接頭語を記憶するテーブルとして構成されている。接頭語は、診療明細書等に記載される医療行為や医薬品の先頭に付けられることがある、例えば、部位名や、「下」、「上」、「左」、「左側」、「右」、「右側」等の位置を表す文字列である。なお、本発明において、文字列は、１文字の場合を含む。接頭語は、例えば、厚生労働省が定めた基本マスタの修飾語マスタに収載された接頭語に使用する修飾語を参考に決めることができる。 As shown in FIG. 4A, the prefix master is configured as a table that stores prefixes, which are specific character strings attached to the beginning of character strings. Prefixes are sometimes attached to the beginning of medical practices and pharmaceuticals described in medical specifications, etc. , "Right side", etc. In addition, in the present invention, the character string includes the case of one character. The prefix can be determined, for example, by referring to modifiers used for prefixes listed in the modifier master of the basic master defined by the Ministry of Health, Labor and Welfare.

図４（ｂ）に示すように、要素文字列マスタは、要素文字列を記憶するテーブルとして構成されている。要素文字列は、医療行為または医薬品を表す文字列に含まれる特定の文字列であり、例えば、「レンズ」、「挿入」等の文字列である。要素文字列としては、医療行為または医薬品を表す文字列を絞り込む際にヒットする候補が多くなりすぎないような文字列、例えば、候補を５００件以下程度に絞り込めるような文字列を採用している。 As shown in FIG. 4B, the element character string master is configured as a table that stores element character strings. Element character strings are specific character strings included in character strings representing medical practices or medicines, for example, character strings such as “lens” and “insert”. As element character strings, character strings that do not result in too many hit candidates when narrowing down the character strings representing medical practices or medicines, for example, character strings that narrow down the candidates to about 500 or less are adopted. there is

図１に戻り、データ取得手段１１は、テキストデータを取得する。データ取得手段１１は、例えば、予め作成されたテキストデータであって記憶装置９０や記憶媒体等に記憶されたテキストデータを読み込んで取得する構成であってもよいし、テキストデータ解析システム１に接続されたキーボード等の入力装置によって入力されたテキストデータを取得する構成であってもよい。予め作成されるテキストデータは、例えば、診療明細書等をスキャナで読み取って光学文字認識（ＯＣＲ）により生成したデータであってもよい。また、データ取得手段１１は、診療明細書等を、テキストデータ解析システム１に接続されたスキャナで読み取ってＯＣＲにより生成したテキストデータを取得する構成であってもよい。 Returning to FIG. 1, the data acquisition means 11 acquires text data. For example, the data acquisition means 11 may be configured to read and acquire text data that is text data created in advance and stored in the storage device 90 or a storage medium, etc., or may be connected to the text data analysis system 1. It may be configured to acquire text data input by an input device such as a keyboard. The text data created in advance may be, for example, data generated by optical character recognition (OCR) by scanning a medical statement or the like with a scanner. Further, the data acquisition unit 11 may be configured to acquire text data generated by OCR by reading a medical statement or the like with a scanner connected to the text data analysis system 1 .

文字列抽出手段１２は、データ取得手段１１が取得したテキストデータから、一の項目を表す一群の文字列を抽出する。文字列抽出手段１２は、図５（ａ）に示すように、一の項目を表す一群の文字列が複数行にまたがって存在する場合、図５（ｂ）に示すように、当該文字列を１行の文字列にして抽出する。 The character string extraction means 12 extracts a group of character strings representing one item from the text data acquired by the data acquisition means 11 . When a group of character strings representing one item exists across multiple lines as shown in FIG. 5(a), the character string extraction means 12 extracts the character string as Extract as a one-line string.

第１検索手段１３は、文字列抽出手段１２が抽出した文字列を第１キーワードとして、医療行為・医薬品マスタから、第１キーワードと一致する文字列がヒットするかを検索する。詳しくは、第１検索手段１３は、第１医療行為・医薬品マスタを参照して、第１キーワードと一致する文字列がヒットするかを検索する。第１検索手段１３は、検索の結果、第１キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する医療行為または医薬品の情報を出力する。 The first search means 13 uses the character string extracted by the character string extraction means 12 as the first keyword, and searches the medical practice/drug master for a character string matching the first keyword. Specifically, the first search means 13 refers to the first medical practice/drug master and searches for a character string that matches the first keyword. When a character string that matches the first keyword is hit as a result of the search, the first search means 13 outputs information on medical practices or medicines corresponding to the hit character string.

ここで、第１実施形態において、第１検索手段１３、第２検索手段１５および出力手段１９は、医療行為または医薬品の情報として、診療行為名と点数表区分番号の情報、および、医薬品名と薬価基準コードの情報の少なくとも一方を出力する。なお、診療行為名は、医科診療行為マスタに収載された省略漢字名称および基本漢字名称の少なくとも一方であり、医薬品名は、医薬品マスタに収載された漢字名称および基本漢字名称の少なくとも一方である。 Here, in the first embodiment, the first search means 13, the second search means 15, and the output means 19, as information on medical practices or drugs, include information on medical practice names and score table classification numbers, and drug names and Output at least one of the drug price standard code information. The medical practice name is at least one of the abbreviated Kanji name and basic Kanji name listed in the medical practice master, and the drug name is at least one of the Kanji name and basic Kanji name listed in the drug master.

また、出力の方法は、任意である。例えば、テキストデータ解析システム１に接続されたディスプレイに表示する方法で出力してもよいし、テキストデータ解析システム１に接続されたプリンタによって印刷する方法で出力してもよいし、テキストデータ解析システム１にインターネット等のネットワークを通じて接続されたユーザの端末に情報を送信する方法で出力してもよい。 Moreover, the method of output is arbitrary. For example, it may be output by a method of displaying on a display connected to the text data analysis system 1, or may be output by a method of printing by a printer connected to the text data analysis system 1, or the text data analysis system. 1 may be output by a method of transmitting information to a user's terminal connected through a network such as the Internet.

第１文字列生成手段１４は、第１検索手段１３による検索の結果、第１キーワードと一致する文字列がヒットしない場合、接頭語マスタを参照して、第１キーワードから、接頭語を取り除いた文字列を生成する。また、第１文字列生成手段１４は、第１キーワードに、接頭語マスタに記憶された接頭語がない場合、生成する文字列を第１キーワードとする。 The first character string generating means 14 refers to the prefix master and removes the prefix from the first keyword when the search by the first searching means 13 does not find a hit of the character string matching the first keyword. Generate a string. Further, when the first keyword does not have the prefix stored in the prefix master, the first character string generating means 14 sets the generated character string as the first keyword.

第２検索手段１５は、第１文字列生成手段１４が生成した文字列を第２キーワードとして、医療行為・医薬品マスタから、第２キーワードと一致する文字列がヒットするかを検索する。詳しくは、第２検索手段１５は、第１医療行為・医薬品マスタを参照して、第２キーワードと一致する文字列がヒットするかを検索する。第２検索手段１５は、検索の結果、第２キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する医療行為または医薬品の情報を出力する。 The second search means 15 uses the character string generated by the first character string generation means 14 as a second keyword to search the medical practice/pharmaceutical master for a character string that matches the second keyword. Specifically, the second search means 15 refers to the first medical practice/drug master and searches for a character string that matches the second keyword. When a character string that matches the second keyword is hit as a result of the search, the second search means 15 outputs information on medical practices or medicines corresponding to the hit character string.

第２文字列生成手段１６は、第２検索手段１５による検索の結果、第２キーワードと一致する文字列がヒットしない場合、第２キーワードから、括弧および当該括弧によって囲われた文字列を取り除いた文字列を生成する。また、第２文字列生成手段１６は、第２キーワードから、以下の文字列（１）～（５）をさらに取り除いた文字列を生成する。
（１）先頭または後尾にある空白
（２）途中にある空白および当該空白以降の文字列
（３）中黒
（４）読点
（５）数字および当該数字の直後にある単位を表す文字列 The second character string generation means 16 removes the parenthesis and the character string enclosed by the parentheses from the second keyword when the second search means 15 does not hit a character string matching the second keyword as a result of the search. Generate a string. The second character string generation means 16 further removes the following character strings (1) to (5) from the second keyword to generate a character string.
(1) Blanks at the beginning or end (2) Blanks in the middle and the character string after the blank (3) Middle black (4) Comma marks (5) Numbers and character strings immediately after the numbers that represent units

一例として、第２文字列生成手段１６は、第２キーワードから、まず、先頭または後尾にある空白を取り除く処理を実行し、次に、途中にある空白および当該空白以降の文字列を取り除く処理を実行する。次に、第２文字列生成手段１６は、丸括弧“（”、“）”や鉤括弧“「”、“」”等の括弧および当該括弧によって囲われた文字列を取り除く処理を実行し、次に、中黒“・”や読点“、”を取り除く処理を実行する。さらに、第２文字列生成手段１６は、数字および当該数字の直後にある単位を表す文字列、例えば、“１０ｍｇ”、“２％”等を取り除く処理を実行する。 As an example, the second character string generating means 16 first removes leading or trailing blanks from the second keyword, and then removes intermediate blanks and character strings after the blanks. Run. Next, the second character string generating means 16 performs a process of removing parentheses such as round brackets “(”, “)” and square brackets ““”, “”” and character strings enclosed by the parentheses, Next, the second character string generating means 16 removes the bullet "·" and the comma "," Further, the second character string generating means 16 generates a character string representing the number and the unit immediately after the number, such as "10 mg". , "2%", etc. are removed.

なお、第１キーワードと第２キーワードが同じ文字列である場合、第２検索手段１５は、第２キーワードと一致する文字列がヒットするかを検索することなく、第２文字列生成手段１６が、第２キーワードから、括弧および当該括弧によって囲われた文字列等を取り除いた文字列を生成してもよい。 In addition, when the first keyword and the second keyword are the same character string, the second search means 15 does not search whether the character string matching the second keyword is hit, and the second character string generation means 16 , a character string may be generated by removing the parentheses and the character strings enclosed by the parentheses from the second keyword.

また、第２文字列生成手段１６は、第２キーワードに、括弧および当該括弧によって囲われた文字列、並びに、上記の文字列（１）～（５）がない場合、生成する文字列を第２キーワードとする。 In addition, if the second keyword does not have parentheses, a character string enclosed by the parentheses, and the above character strings (1) to (5), the second character string generating means 16 generates a character string as follows: 2 keywords.

類似文字列抽出手段１７は、第２文字列生成手段１６が生成した文字列を第３キーワードとして、医療行為・医薬品マスタから、第３キーワードを含む文字列を抽出する。詳しくは、類似文字列抽出手段１７は、第２医療行為・医薬品マスタおよび第１医療行為・医薬品マスタを参照して、第３キーワードを含む文字列（以下、「第１類似文字列」ともいう。）を抽出する。より詳しくは、類似文字列抽出手段１７は、第３キーワードと第２医療行為・医薬品マスタから、第１類似文字列に対応づけられた点数表区分番号または薬価基準コードを抽出し、第１医療行為・医薬品マスタから、抽出した点数表区分番号または薬価基準コードに対応づけられた第１類似文字列を抽出する。 The similar character string extracting means 17 uses the character string generated by the second character string generating means 16 as the third keyword and extracts a character string containing the third keyword from the medical practice/drug master. Specifically, the similar character string extraction means 17 refers to the second medical practice/drug master and the first medical practice/drug master, and extracts a character string containing the third keyword (hereinafter also referred to as “first similar character string”). ). More specifically, the similar character string extraction means 17 extracts the score table classification number or drug price standard code associated with the first similar character string from the third keyword and the second medical practice/drug master, A first similar character string associated with the extracted score table division number or drug price standard code is extracted from the act/medicine master.

第１類似度算出手段１８は、類似文字列抽出手段１７が抽出した第１類似文字列が複数ある場合、類似文字列抽出手段１７が抽出した第１類似文字列と、第２キーワードとの類似度をそれぞれ算出する。 When there are a plurality of first similar character strings extracted by the similar character string extraction means 17, the first similarity calculation means 18 calculates the similarity between the first similar character string extracted by the similar character string extraction means 17 and the second keyword. Calculate each degree.

ここで、第１実施形態において、第１類似度算出手段１８、第２類似度算出手段２０および第３類似度算出手段２２は、一例として、レーベンシュタイン距離およびジャロ・ウィンクラー距離の少なくとも一方に基づいて、２つの文字列の類似度を算出する。 Here, in the first embodiment, the first similarity calculator 18, the second similarity calculator 20, and the third similarity calculator 22, for example, calculate at least one of the Levenshtein distance and the Jaro-Winkler distance. Based on this, the degree of similarity between the two character strings is calculated.

出力手段１９は、類似文字列抽出手段１７が抽出した第１類似文字列に対応する少なくとも１つの医療行為または医薬品の情報を出力する。詳しくは、出力手段１９は、類似文字列抽出手段１７が抽出した第１類似文字列が１つである場合、当該第１類似文字列に対応する一の医療行為または医薬品の情報を出力する。また、出力手段１９は、類似文字列抽出手段１７が抽出した第１類似文字列が複数ある場合、複数の第１類似文字列のうち、第１類似度算出手段１８が算出した類似度が所定以上である第１類似文字列に対応する医療行為または医薬品の情報を出力する。 The output means 19 outputs information on at least one medical practice or drug corresponding to the first similar character string extracted by the similar character string extraction means 17 . Specifically, when the number of first similar character strings extracted by the similar character string extracting means 17 is one, the output means 19 outputs information on one medical practice or drug corresponding to the first similar character string. Further, when there are a plurality of first similar character strings extracted by the similar character string extracting means 17, the output means 19 determines that the degree of similarity calculated by the first similarity degree calculating means 18 among the plurality of first similar character strings is a predetermined value. Information on medical practice or pharmaceuticals corresponding to the above first similar character string is output.

第１実施形態において、類似度は、０から１の数値として算出され、数値が大きいほど２つの文字列の類似度が高いことを表し、１の場合、２つの文字列が一致していることを表す。類似度の所定値は、例えば、０．８である。なお、テキストデータ解析システム１は、類似度の所定値をユーザが任意に設定可能な構成であってもよい。 In the first embodiment, the degree of similarity is calculated as a numerical value from 0 to 1. The greater the numerical value, the higher the degree of similarity between the two character strings. represents A predetermined similarity value is, for example, 0.8. Note that the text data analysis system 1 may be configured so that the user can arbitrarily set a predetermined similarity value.

第２類似度算出手段２０は、類似文字列抽出手段１７が、医療行為・医薬品マスタから第３キーワードを含む文字列（第１類似文字列）を抽出できなかった場合、第２文字列生成手段１６が生成した第３キーワードと、第２医療行為・医薬品マスタに記憶されている索引文字列との類似度を算出する。 If the similar character string extraction means 17 cannot extract a character string (first similar character string) containing the third keyword from the medical practice/drug master, the second similarity calculation means 20 calculates the second similarity degree calculation means 16 and the index character string stored in the second medical practice/drug master.

類似文字列抽出手段１７は、第２類似度算出手段２０が算出した類似度が所定以上である索引文字列がある場合、医療行為・医薬品マスタから、当該索引文字列に対応する文字列を抽出する。詳しくは、類似文字列抽出手段１７は、第３キーワードとの類似度が所定以上である索引文字列がある場合、第２医療行為・医薬品マスタから、当該類似文字列に対応づけられた点数表区分番号または薬価基準コードを抽出し、第１医療行為・医薬品マスタから、抽出した点数表区分番号または薬価基準コードに対応づけられた文字列（以下、「対応文字列」ともいう。）を抽出する。 The similar character string extracting means 17 extracts a character string corresponding to the index character string from the medical practice/pharmaceutical master when there is an index character string whose similarity calculated by the second similarity calculating means 20 is equal to or higher than a predetermined value. do. Specifically, when there is an index character string having a degree of similarity with the third keyword equal to or greater than a predetermined value, the similar character string extraction means 17 extracts the score table associated with the similar character string from the second medical practice/pharmaceutical master. A classification number or drug price standard code is extracted, and a character string (hereinafter also referred to as a "corresponding character string") associated with the extracted score table classification number or drug price standard code is extracted from the first medical practice/drug master. do.

類似文字列抽出手段１７が抽出した対応文字列が複数ある場合、第１類似度算出手段１８は、類似文字列抽出手段１７が抽出した対応文字列と、第２キーワードとの類似度をそれぞれ算出し、出力手段１９は、第１類似度算出手段１８が算出した類似度が所定以上である対応文字列に対応する医療行為または医薬品の情報を出力する。 When there are a plurality of corresponding character strings extracted by the similar character string extracting means 17, the first similarity calculating means 18 calculates similarity between the corresponding character strings extracted by the similar character string extracting means 17 and the second keyword. Then, the output means 19 outputs information on medical practice or pharmaceuticals corresponding to the corresponding character strings for which the degree of similarity calculated by the first degree of similarity calculation means 18 is equal to or greater than a predetermined value.

要素文字列抽出手段２１は、第２類似度算出手段２０が算出した類似度が所定以上である索引文字列がない場合、要素文字列マスタを参照して、第２キーワードから、少なくとも１つの要素文字列を抽出する。 If there is no index character string whose similarity calculated by the second similarity calculation unit 20 is equal to or greater than a predetermined value, the element character string extraction unit 21 refers to the element character string master and extracts at least one element from the second keyword. Extract a string.

類似文字列抽出手段１７は、要素文字列抽出手段２１が抽出した要素文字列を第４キーワードとして、医療行為・医薬品マスタから、第４キーワードを含む文字列を抽出する。詳しくは、類似文字列抽出手段１７は、第１医療行為・医薬品マスタから、第４キーワードを含む文字列（以下、「第２類似文字列」ともいう。）を抽出する。第１実施形態では、類似文字列抽出手段１７は、要素文字列抽出手段２１が抽出した要素文字列が複数ある場合、第２キーワードの中央に近い位置にある要素文字列（以下、第１実施形態において「優先要素文字列」ともいう。）を、まず、第４キーワードとして、医療行為・医薬品マスタから、第２類似文字列を抽出する。 The similar character string extracting means 17 uses the element character string extracted by the element character string extracting means 21 as the fourth keyword, and extracts character strings containing the fourth keyword from the medical practice/drug master. Specifically, the similar character string extracting means 17 extracts a character string including the fourth keyword (hereinafter also referred to as “second similar character string”) from the first medical practice/drug master. In the first embodiment, when there are a plurality of element character strings extracted by the element character string extraction means 21, the similar character string extraction means 17 extracts an element character string located near the center of the second keyword (hereinafter referred to as the first embodiment). Also referred to as "priority element character string" in the form.) is first used as the fourth keyword, and a second similar character string is extracted from the medical practice/drug master.

第３類似度算出手段２２は、類似文字列抽出手段１７が抽出した第２類似文字列と、第２キーワードとの類似度を算出する。詳しくは、第３類似度算出手段２２は、優先要素文字列を第４キーワードとして類似文字列抽出手段１７が抽出した第２類似文字列について、他の要素文字列よりも先に第２キーワードとの類似度を算出する。 The third similarity calculator 22 calculates the similarity between the second similar character string extracted by the similar character string extractor 17 and the second keyword. More specifically, the third similarity calculating means 22 selects the second similar character string extracted by the similar character string extracting means 17 by using the priority element character string as the fourth keyword as the second keyword before other element character strings. Calculate the similarity of

出力手段１９は、第３類似度算出手段２２が先に類似度を算出した第２類似文字列の中に類似度が所定以上である文字列がある場合、当該第２類似文字列に対応する医療行為または医薬品の情報を出力する。出力手段１９が、先に類似度を算出した第２類似文字列に対応する医療行為または医薬品の情報を出力した場合、それ以降、類似文字列抽出手段１７は、他の要素文字列について第２類似文字列を抽出せず、第３類似度算出手段２２は、類似度を算出しない。 If there is a character string whose degree of similarity is equal to or higher than a predetermined level among the second similar character strings whose degree of similarity has been previously calculated by the third degree of similarity calculation unit 22, the output unit 19 selects the character string corresponding to the second similar character string. Output information for medical procedures or medicines. When the output means 19 outputs the information of the medical practice or medicine corresponding to the second similar character string for which the degree of similarity was previously calculated, the similar character string extraction means 17 subsequently extracts the second similar character string for the other element character strings. No similar character string is extracted, and the third similarity calculator 22 does not calculate the similarity.

類似文字列抽出手段１７は、先に類似度を算出した第２類似文字列の中に類似度が所定以上である文字列がない場合、次に第２キーワードの中央に近い位置にある要素文字列を第４キーワードとして、第１医療行為・医薬品マスタから、第２類似文字列を抽出し、第３類似度算出手段２２は、抽出した第２類似文字列について第２キーワードとの類似度を算出する。そして、出力手段１９は、類似度を算出した第２類似文字列の中に類似度が所定以上である文字列がある場合、当該第２類似文字列に対応する医療行為または医薬品の情報を出力する。 If there is no character string with a degree of similarity equal to or higher than a predetermined degree among the second similar character strings whose degree of similarity has been previously calculated, the similar character string extracting means 17 extracts an element character at a position close to the center of the second keyword. Using the column as the fourth keyword, the second similar character string is extracted from the first medical practice/pharmaceutical master, and the third similarity calculation means 22 calculates the similarity of the extracted second similar character string to the second keyword. calculate. Then, if there is a character string whose degree of similarity is equal to or greater than a predetermined value among the second similar character strings for which the degree of similarity has been calculated, the output means 19 outputs information on medical practice or pharmaceuticals corresponding to the second similar character string. do.

要素文字列抽出手段２１が抽出した要素文字列が複数ある場合において、２つの要素文字列の第２キーワードの中央からの位置が同じである場合、類似文字列抽出手段１７は、各要素文字列を第４キーワードとして、医療行為・医薬品マスタから、第２類似文字列をそれぞれ抽出し、抽出した第２類似文字列について第２キーワードとの類似度をそれぞれ算出する。そして、出力手段１９は、類似度を算出した第２類似文字列の中に類似度が所定以上である文字列がある場合、当該第２類似文字列に対応する医療行為または医薬品の情報を出力する。なお、この場合、抽出した第２類似文字列の数が少ない方について、多い方よりも先に類似度を算出し、先に類似度を算出した第２類似文字列の中に類似度が所定以上である文字列がある場合、当該第２類似文字列に対応する医療行為または医薬品の情報を出力して、処理を終了してもよい。 When there are a plurality of element character strings extracted by the element character string extracting means 21, if the positions of the two element character strings from the center of the second keyword are the same, the similar character string extracting means 17 extracts each element character string as the fourth keyword, the second similar character strings are extracted from the medical practice/drug master, and the degree of similarity between the extracted second similar character strings and the second keyword is calculated. Then, if there is a character string whose degree of similarity is equal to or greater than a predetermined value among the second similar character strings for which the degree of similarity has been calculated, the output means 19 outputs information on medical practice or pharmaceuticals corresponding to the second similar character string. do. In this case, the degree of similarity is calculated for the extracted second similar character strings with a smaller number before those with a larger number of extracted second similar character strings, and the degree of similarity is determined among the second similar character strings for which the degree of similarity is calculated first. If there is a character string that satisfies the above, the information on the medical practice or medicine corresponding to the second similar character string may be output, and the process may be terminated.

ここで、具体的な例を示しながら、第１実施形態のテキストデータ解析システム１における処理について説明する。
第１の例として、図５（ｃ）に示すように、第１検索手段１３は、データ取得手段１１が取得し、文字列抽出手段１２が抽出した文字列“左水晶体再建術（眼内レンズを挿入する場合・その他のもの）”を第１キーワードとして、第１医療行為・医薬品マスタから、第１キーワードと一致する文字列がヒットするかを検索する。 Here, the processing in the text data analysis system 1 of the first embodiment will be described while showing a specific example.
As a first example, as shown in FIG. 5(c), the first search means 13 retrieves the character string "Left lenticular reconstruction (intraocular lens When inserting , other things)” is used as the first keyword, the first medical practice/pharmaceutical master is searched for hits for a character string that matches the first keyword.

第１文字列生成手段１４は、第１検索手段１３による検索の結果、第１キーワードと一致する文字列がヒットしない場合、図５（ｄ）に示すように、第１キーワードから、接頭語マスタを参照して、接頭語“左”を取り除いた文字列“水晶体再建術（眼内レンズを挿入する場合・その他のもの）”を生成する。 If the first search means 13 does not find a hit of a character string that matches the first keyword, the first character string generation means 14 generates a prefix master from the first keyword as shown in FIG. 5(d). to generate the string "Lens Reconstruction (Intraocular Lens Insertion/Other)" with the prefix "Left" removed.

第２検索手段１５は、第１文字列生成手段１４が生成した文字列“水晶体再建術（眼内レンズを挿入する場合・その他のもの）”を第２キーワードとして、第１医療行為・医薬品マスタから、第２キーワードと一致する文字列がヒットするかを検索する。 The second search means 15 uses the character string "lens reconstruction (in the case of inserting an intraocular lens/others)" generated by the first character string generation means 14 as a second keyword to search the first medical practice/pharmaceutical master data. search for a hit for a character string that matches the second keyword.

第２文字列生成手段１６は、第２検索手段１５による検索の結果、第２キーワードと一致する文字列がヒットしない場合、図５（ｅ）に示すように、第２キーワードから、括弧および当該括弧によって囲われた文字列“（眼内レンズを挿入する場合・その他のもの）”を取り除いた文字列“水晶体再建術”を生成する。 If the second search means 15 does not find a hit of a character string that matches the second keyword, the second character string generation means 16 generates parentheses and relevant characters from the second keyword as shown in FIG. Generate the string “Lens Reconstruction” by removing the string “(if intraocular lens is inserted/other)” enclosed in parentheses.

類似文字列抽出手段１７は、第２文字列生成手段１６が生成した文字列“水晶体再建術”を第３キーワードとして、第２医療行為・医薬品マスタから、第３キーワードを含む文字列（第１類似文字列）に対応づけられた点数表区分番号「Ｋ２８２１イ」、「Ｋ２８２１ロ」、「Ｋ２８２２」および「Ｋ２８２３」を抽出する。その後、類似文字列抽出手段１７は、第１医療行為・医薬品マスタから、抽出した点数表区分番号に対応づけられた、図５（ｆ）に示す４つの第１類似文字列を抽出する。 The similar character string extraction means 17 extracts a character string (first The score table division numbers "K2821i", "K2821b", "K2822" and "K2823" associated with the similar character string) are extracted. After that, the similar character string extracting means 17 extracts the four first similar character strings shown in FIG. 5(f) associated with the extracted score table section numbers from the first medical practice/drug master.

第１類似度算出手段１８は、類似文字列抽出手段１７が抽出した第１類似文字列が複数ある場合、図５（ｇ）に示すように、各第１類似文字列と、第２キーワード“水晶体再建術（眼内レンズを挿入する場合・その他のもの）”との類似度をそれぞれ算出する。 When there are a plurality of first similar character strings extracted by the similar character string extracting means 17, the first similarity calculating means 18 calculates each first similar character string and the second keyword " The degree of similarity with “lens reconstruction surgery (intraocular lens insertion/others)” is calculated.

出力手段１９は、類似文字列抽出手段１７が抽出した第１類似文字列のうち、第１類似度算出手段１８が算出した類似度が所定値０．８以上である第１類似文字列に対応する医療行為・医薬品の情報を出力する。具体的には、出力手段１９は、診療行為名「水晶体再建術（眼内レンズを挿入する場合）（その他のもの）」と点数表区分番号「Ｋ２８２１ロ」、および、診療行為名「水晶体再建術（眼内レンズを挿入しない場合）」と点数表区分番号「Ｋ２８２２」の情報を出力する。なお、出力手段１９は、第１類似度算出手段１８が算出した類似度の情報をさらに出力する構成であってもよい。 The output means 19 corresponds to first similar character strings having a similarity calculated by the first similarity calculation means 18 equal to or higher than a predetermined value of 0.8 among the first similar character strings extracted by the similar character string extraction means 17. Outputs information on medical practices and drugs to be performed. Specifically, the output means 19 outputs the name of the medical practice "lens reconstruction (in the case of inserting an intraocular lens) (others)", the classification number "K2821-b" of the score table, and the name of the medical practice "lens reconstruction". surgery (without inserting an intraocular lens)” and the score table division number “K2822” are output. In addition, the output means 19 may be configured to further output the similarity information calculated by the first similarity calculation means 18 .

次に、第２の例について説明する。第２の例は、第１の例と同じ文字列“左水晶体再建術（眼内レンズを挿入する場合・その他のもの）”について、“水晶体再建術”の部分が、ＯＣＲにより“水昌体再律術”と誤って読み取られた場合である。 Next, a second example will be described. In the second example, the same character string as the first example, "Left lens reconstruction surgery (for inserting an intraocular lens/others)", the part "Lens reconstruction surgery" This is the case when it is mistakenly read as "Re-Ritsujutsu".

第２の例の場合、図６（ａ）に示すように、第１検索手段１３は、文字列“左水昌体再律術（眼内レンズを挿入する場合・その他のもの）”を第１キーワードとして、第１医療行為・医薬品マスタから、第１キーワードと一致する文字列がヒットするかを検索し、ヒットしないので、図６（ｂ）に示すように、第１文字列生成手段１４は、第１キーワードから、接頭語マスタを参照して、接頭語“左”を取り除いた文字列“水昌体再律術（眼内レンズを挿入する場合・その他のもの）”を生成する。 In the case of the second example, as shown in FIG. 6(a), the first search means 13 searches for the character string "left hydrangea rehydration (in the case of inserting an intraocular lens/others)" as the first As one keyword, the first medical practice/pharmaceutical master is searched for a character string that matches the first keyword. refers to the prefix master from the first keyword, and generates a character string "water religion (for intraocular lens insertion/others)" with the prefix "left" removed.

第２検索手段１５は、文字列“水昌体再律術（眼内レンズを挿入する場合・その他のもの）”を第２キーワードとして、第１医療行為・医薬品マスタから、第２キーワードと一致する文字列がヒットするかを検索し、ヒットしないので、図６（ｃ）に示すように、第２文字列生成手段１６は、第２キーワードから、括弧および当該括弧によって囲われた文字列“（眼内レンズを挿入する場合・その他のもの）”を取り除いた文字列“左水昌体再律術”を生成する。 The second search means 15 uses the character string "water body re-rhythmia (in the case of inserting an intraocular lens/others)" as the second keyword, and matches the second keyword from the first medical practice/pharmaceutical master. Since there is no hit, the second character string generating means 16 extracts the parentheses and the character string enclosed by the parentheses from the second keyword, as shown in FIG. 6(c). (When inserting an intraocular lens/Others)" is removed to generate the character string "Left Suichang body re-rhythm".

類似文字列抽出手段１７は、文字列“水昌体再律術”を第３キーワードとして、第２医療行為・医薬品マスタから、第３キーワードを含む文字列を抽出しようとするが、第２の例では、第３キーワードを含む文字列を抽出することができない。そこで、図６（ｄ）に示すように、第２類似度算出手段２０は、第３キーワード“水昌体再律術”と、第２医療行為・医薬品マスタに記憶されている各索引文字列との類似度をそれぞれ算出する。 The similar character string extraction means 17 attempts to extract a character string containing the third keyword from the second medical practice/pharmaceutical master by using the character string "Suicyotai Reritsujutsu" as the third keyword. In the example, a string containing the third keyword cannot be extracted. Therefore, as shown in FIG. 6(d), the second similarity calculation means 20 uses the third keyword "water body re-rhythm" and each index character string stored in the second medical practice/drug master. Calculate the similarity with each.

類似文字列抽出手段１７は、第２類似度算出手段２０が算出した類似度が所定値０．８以上である索引文字列がある場合、第１医療行為・医薬品マスタから、当該索引文字列に対応する文字列（対応文字列）を抽出する。具体的には、類似文字列抽出手段１７は、類似度が所定値０．８以上である索引文字列“水晶体再建術”があるので、第１医療行為・医薬品マスタから、図５（ｅ）に示す４つの対応文字列を抽出する。 If there is an index character string with a similarity calculated by the second similarity calculation unit 20 equal to or greater than a predetermined value of 0.8, the similar character string extraction means 17 extracts the index character string from the first medical practice/drug master. Extract the corresponding string (corresponding string). Specifically, the similar character string extracting means 17 has the index character string "lens reconstruction surgery" whose similarity is equal to or greater than the predetermined value of 0.8. Extract the four corresponding character strings shown in .

第１類似度算出手段１８は、類似文字列抽出手段１７が抽出した対応文字列が複数ある場合、図６（ｆ）に示すように、各対応文字列と、第２キーワード“水昌体再律術（眼内レンズを挿入する場合・その他のもの）”との類似度をそれぞれ算出する。
出力手段１９は、類似文字列抽出手段１７が抽出した対応文字列のうち、第１類似度算出手段１８が算出した類似度が所定値０．８以上である対応文字列に対応する医療行為・医薬品の情報を出力する。 When there are a plurality of corresponding character strings extracted by the similar character string extracting means 17, the first similarity calculating means 18 calculates each of the corresponding character strings and the second keyword "square body" as shown in FIG. Calculate the degree of similarity with the surgical technique (intraocular lens insertion/others).
The output unit 19 outputs medical practice data corresponding to corresponding character strings having a similarity calculated by the first similarity calculation unit 18 equal to or higher than a predetermined value of 0.8 among the corresponding character strings extracted by the similar character string extraction unit 17. Output drug information.

次に、第３の例について説明する。第３の例は、診療明細書等に記載された項目“短手３（水晶体再建術・眼内レンズ挿入・その他のもの(片側))”が、ＯＣＲにより“垣手３（水晶体再建指・硯内レンズ挿入・その他のもの(片例))”と誤って読み取られた場合である。 Next, a third example will be described. In the third example, the item "short hand 3 (lens reconstruction surgery, intraocular lens insertion, others (one side))" described in the medical specification etc. Inkstone lens insertion, other things (example))” is mistakenly read.

第３の例の場合、図７（ａ）に示すように、第１検索手段１３は、文字列“垣手３（水晶体再建指・硯内レンズ挿入・その他のもの(片例))”を第１キーワードとして、第１医療行為・医薬品マスタから、第１キーワードと一致する文字列がヒットするかを検索し、ヒットしないので、第１文字列生成手段１４は、第１キーワードから、接頭語マスタを参照して、接頭語を取り除いた文字列を生成しようとする。第３の例では、第１キーワードに、接頭語マスタに記憶された接頭語がないため、図７（ｂ）に示すように、第１文字列生成手段１４は、第１キーワードと同じ文字列“垣手３（水晶体再建指・硯内レンズ挿入・その他のもの(片例))”を生成する。 In the case of the third example, as shown in FIG. 7(a), the first search means 13 searches for the character string "Kagete 3 (lens reconstruction finger, inkstone lens insertion, others (one example))". As the first keyword, the first medical practice/pharmaceutical master is searched to see if there is a hit for a character string that matches the first keyword. Attempts to generate a string with the prefix removed by referring to the master. In the third example, since the first keyword has no prefix stored in the prefix master, as shown in FIG. Create a "gable 3 (lens reconstruction finger, inkstone lens insertion, others (one example))".

第２検索手段１５は、文字列“垣手３（水晶体再建指・硯内レンズ挿入・その他のもの(片例))”を第２キーワードとして、第１医療行為・医薬品マスタから、第２キーワードと一致する文字列がヒットするかを検索し、ヒットしないので、図７（ｃ）に示すように、第２文字列生成手段１６は、第２キーワードから、括弧および当該括弧によって囲われた文字列“（水晶体再建指・硯内レンズ挿入・その他のもの(片例))”を取り除いた文字列“垣手３”を生成する。 The second search means 15 uses the character string "Kagete 3 (lens reconstruction finger, lens inserted in the inkstone, others (one example))" as the second keyword, and searches the first medical practice/pharmaceutical master for the second keyword. 7(c), the second character string generating means 16 generates parentheses and the characters enclosed by the parentheses from the second keyword, as shown in FIG. 7(c). A character string “Kagete 3” is generated by removing the string “(Fingers for reconstruction of the lens, insertion of the lens in the inkstone, and others (one example))”.

類似文字列抽出手段１７は、文字列“垣手３”を第３キーワードとして、第２医療行為・医薬品マスタから、第３キーワードを含む文字列を抽出しようとするが、第３の例では、第３キーワードを含む文字列を抽出することができない。そこで、第２類似度算出手段２０は、第３キーワード“垣手３”と、第２医療行為・医薬品マスタに記憶されている各索引文字列との類似度をそれぞれ算出する。 The similar character string extracting means 17 uses the character string "hakide 3" as the third keyword and attempts to extract a character string containing the third keyword from the second medical practice/drug master. Strings containing the third keyword cannot be extracted. Therefore, the second similarity calculating means 20 calculates the similarity between the third keyword "hiccup 3" and each index character string stored in the second medical practice/drug master.

第３の例では、第２類似度算出手段２０が算出した類似度は、すべて所定値０．８未満となり、図７（ｄ）に示すように、要素文字列抽出手段２１は、要素文字列マスタを参照して、第２キーワード（図７（ｂ）参照）から、要素文字列“レンズ”および“挿入”を抽出する。 In the third example, the similarities calculated by the second similarity calculator 20 are all less than the predetermined value of 0.8, and as shown in FIG. By referring to the master, element character strings "lens" and "insert" are extracted from the second keyword (see FIG. 7(b)).

類似文字列抽出手段１７は、要素文字列抽出手段２１が抽出した要素文字列が複数あるので、第２キーワードの中央に近い位置にある要素文字列“レンズ”を、まず、第４キーワードとして、第１医療行為・医薬品マスタから、図８（ａ）に示すような、第４キーワード“レンズ”を含む文字列（第２類似文字列）を抽出する。 Since there are a plurality of element character strings extracted by the element character string extraction means 21, the similar character string extraction means 17 first extracts the element character string "lens" located near the center of the second keyword as the fourth keyword, A character string (second similar character string) containing the fourth keyword "lens" as shown in FIG. 8(a) is extracted from the first medical practice/drug master.

図８（ｂ）に示すように、第３類似度算出手段２２は、抽出した第２類似文字列について、第２キーワード“垣手３（水晶体再建指・硯内レンズ挿入・その他のもの(片例))”との類似度をそれぞれ算出する。 As shown in FIG. 8(b), the third similarity calculation means 22 calculates the extracted second similar character string for the second keyword "Kakite 3 (lens reconstruction finger, inkstone lens insertion, others (piece For example, the similarity with ))” is calculated respectively.

出力手段１９は、類似文字列抽出手段１７が抽出した第２類似文字列のうち、第３類似度算出手段２２が算出した類似度が所定値０．８以上である類似文字列に対応する医療行為または医薬品の情報を出力する。具体的には、出力手段１９は、診療行為名「短手３（水晶体再建術・眼内レンズ挿入・その他・片側）」と点数表区分番号「Ａ４００３ホ」、診療行為名「短手３（水晶体再建術・眼内レンズ挿入・その他・両側）」と点数表区分番号「Ａ４００３ヘ」、および、診療行為名「短手３（水晶体再建術・眼内レンズ挿入・その他・片側）（生活療養）」と点数表区分番号「Ａ４００３ホ」の情報を出力する。なお、出力手段１９は、第３類似度算出手段２２が算出した類似度の情報をさらに出力する構成であってもよい。 The output unit 19 outputs medical information corresponding to similar character strings, among the second similar character strings extracted by the similar character string extraction unit 17, for which the degree of similarity calculated by the third similarity calculation unit 22 is equal to or greater than a predetermined value of 0.8. Outputs action or drug information. Specifically, the output means 19 outputs the name of medical practice "short hand 3 (lens reconstruction surgery/intraocular lens insertion/other/unilateral)", the score table classification number "A4003ho", and the name of medical treatment "short hand 3 ( Lens reconstruction surgery, intraocular lens insertion, other, bilateral)”, score table category number “A4003 He”, and medical practice name “short hand 3 (lens reconstruction, intraocular lens insertion, other, unilateral) (living treatment )” and the score table division number “A4003E” are output. In addition, the output means 19 may be configured to further output similarity information calculated by the third similarity calculation means 22 .

次に、第１実施形態のテキストデータ解析システム１の動作（テキストデータ解析システム１を構成するコンピュータが備える手段が実行するテキストデータ解析方法）の一例について、フローチャートを参照しながら説明する。 Next, an example of the operation of the text data analysis system 1 of the first embodiment (text data analysis method executed by the means provided in the computer constituting the text data analysis system 1) will be described with reference to a flowchart.

図９に示すように、テキストデータ解析システム１は、まず、テキストデータを取得する（Ｓ１１０）（データ取得ステップ）。次に、テキストデータ解析システム１は、データ取得ステップで取得したテキストデータから、一の項目を表す一群の文字列を抽出する（Ｓ１２０）（文字列抽出ステップ）。 As shown in FIG. 9, the text data analysis system 1 first acquires text data (S110) (data acquisition step). Next, the text data analysis system 1 extracts a group of character strings representing one item from the text data acquired in the data acquisition step (S120) (character string extraction step).

次に、テキストデータ解析システム１は、文字列抽出ステップで抽出した文字列を第１キーワードとして、第１医療行為・医薬品マスタから、第１キーワードと一致する文字列がヒットするかを検索する（Ｓ１３１）（第１検索ステップ）。そして、検索の結果、第１キーワードと一致する文字列がヒットした場合（Ｓ１３２，Ｙｅｓ）、ステップＳ１８３に進む。 Next, using the character string extracted in the character string extraction step as the first keyword, the text data analysis system 1 searches the first medical practice/pharmaceutical master for a character string that matches the first keyword ( S131) (first search step). If a character string matching the first keyword is hit as a result of the search (S132, Yes), the process proceeds to step S183.

一方、第１検索ステップにおける検索の結果、第１キーワードと一致する文字列がヒットしない場合（Ｓ１３２，Ｎｏ）、テキストデータ解析システム１は、接頭語マスタを参照して、第１キーワードから、接頭語を取り除いた文字列を生成する（Ｓ１４０）（第１文字列生成ステップ）。 On the other hand, if the search result in the first search step does not hit a character string that matches the first keyword (S132, No), the text data analysis system 1 refers to the prefix master and extracts the prefix from the first keyword. A character string with words removed is generated (S140) (first character string generation step).

次に、テキストデータ解析システム１は、第１文字列生成ステップで生成した文字列を第２キーワードとして、第１医療行為・医薬品マスタから、第２キーワードと一致する文字列がヒットするかを検索する（Ｓ１５１）（第２検索ステップ）。そして、検索の結果、第２キーワードと一致する文字列がヒットした場合（Ｓ１５２，Ｙｅｓ）、ステップＳ１８３に進む。 Next, the text data analysis system 1 uses the character string generated in the first character string generation step as the second keyword, and searches the first medical practice/pharmaceutical master for a character string that matches the second keyword. (S151) (second search step). Then, if a character string that matches the second keyword is hit as a result of the search (S152, Yes), the process proceeds to step S183.

一方、第２検索ステップにおける検索の結果、第２キーワードと一致する文字列がヒットしない場合（Ｓ１５２，Ｎｏ）、テキストデータ解析システム１は、第２キーワードから、括弧および当該括弧によって囲われた文字列等を取り除いた文字列を生成する（Ｓ１６０）（第２文字列生成ステップ）。 On the other hand, if the search result in the second search step does not hit a character string that matches the second keyword (S152, No), the text data analysis system 1 extracts the parentheses and the characters enclosed by the parentheses from the second keyword. A character string is generated by removing columns and the like (S160) (second character string generating step).

次に、テキストデータ解析システム１は、第２文字列生成ステップで生成した文字列を第３キーワードとして、第２医療行為・医薬品マスタおよび第１医療行為・医薬品マスタから、第３キーワードを含む文字列（第１類似文字列）を抽出する（Ｓ１７１）（類似文字列抽出ステップ）。そして、テキストデータ解析システム１は、第１類似文字列を抽出できたかを判定する（Ｓ１７２）。 Next, the text data analysis system 1 uses the character string generated in the second character string generating step as the third keyword, and extracts characters containing the third keyword from the second medical practice/pharmaceutical master and the first medical practice/pharmaceutical master. A string (first similar character string) is extracted (S171) (similar character string extraction step). The text data analysis system 1 then determines whether the first similar character string has been extracted (S172).

第１類似文字列を抽出できた場合（Ｓ１７２，Ｙｅｓ）、テキストデータ解析システム１は、抽出した第１類似文字列が複数あるかを判定する（Ｓ１７３）。そして、抽出した第１類似文字列が１つである場合（Ｓ１７３，Ｎｏ）、ステップＳ１８３に進む。 If the first similar character string can be extracted (S172, Yes), the text data analysis system 1 determines whether there are multiple extracted first similar character strings (S173). Then, if the extracted first similar character string is one (S173, No), the process proceeds to step S183.

一方、類似文字列抽出ステップで抽出した文字列が複数ある場合（Ｓ１７３，Ｙｅｓ）、テキストデータ解析システム１は、類似文字列抽出ステップで抽出した第１類似文字列と、第２キーワードとの類似度をそれぞれ算出する（Ｓ１８１）（第１類似度算出ステップ）。そして、テキストデータ解析システム１は、類似文字列抽出ステップで抽出した第１類似文字列のうち、第１類似度算出ステップで算出した類似度が所定以上である第１類似文字列を抽出し（Ｓ１８２）、ステップＳ１８３に進む。 On the other hand, if there are a plurality of character strings extracted in the similar character string extraction step (S173, Yes), the text data analysis system 1 determines the similarity between the first similar character string extracted in the similar character string extraction step and the second keyword. degree is calculated (S181) (first degree of similarity calculation step). Then, the text data analysis system 1 extracts, from among the first similar character strings extracted in the similar character string extraction step, the first similar character strings for which the degree of similarity calculated in the first similarity degree calculation step is equal to or greater than a predetermined value ( S182) and proceeds to step S183.

ここで、例えば、診療明細書や調剤明細書等には、手術名等の診療行為に関する項目や医薬品に関する項目が複数記載されていたり、当該項目以外のほかの項目が記載されていたりすることがあるため、テキストデータには、一の項目を表す一群の文字列が複数存在する場合がある。そこで、テキストデータ解析システム１は、ステップＳ１８３において、次の、一の項目を表す一群の文字列があるかを判定する。テキストデータ解析システム１は、次の文字列がある場合（Ｓ１８３，Ｙｅｓ）、ステップＳ１２０に戻って以降の処理を実行し、次の文字列がない場合（Ｓ１８３，Ｎｏ）、ステップＳ１９１に進む。 Here, for example, a medical statement or a prescription statement may contain multiple items related to medical procedures such as the names of surgeries and items related to pharmaceuticals, or may contain items other than the relevant items. Therefore, text data may have a plurality of groups of character strings representing one item. Therefore, in step S183, the text data analysis system 1 determines whether there is the next group of character strings representing one item. If there is a next character string (S183, Yes), the text data analysis system 1 returns to step S120 and executes subsequent processing, and if there is no next character string (S183, No), proceeds to step S191.

ステップＳ１９１において、テキストデータ解析システム１は、第１検索ステップまたは第２検索ステップでヒットした文字列に対応する医療行為・医薬品の情報を抽出し、抽出した医療行為・医薬品の情報を出力する（Ｓ１９２）。または、テキストデータ解析システム１は、類似文字列抽出ステップで抽出した第１類似文字列に対応する少なくとも１つの医療行為・医薬品の情報を抽出し（Ｓ１９１）、抽出した医療行為・医薬品の情報を出力する（Ｓ１９２）（出力ステップ）。 In step S191, the text data analysis system 1 extracts medical practice/pharmaceutical information corresponding to the character string hit in the first search step or the second search step, and outputs the extracted medical practice/pharmaceutical information ( S192). Alternatively, the text data analysis system 1 extracts at least one medical practice/pharmaceutical information corresponding to the first similar character string extracted in the similar character string extraction step (S191), and extracts the extracted medical practice/pharmaceutical information. Output (S192) (output step).

ステップＳ１７２において、第２医療行為・医薬品マスタおよび第１医療行為・医薬品マスタから、第３キーワードを含む文字列（第１類似文字列）を抽出できなかった場合（Ｎｏ）、図１０に示すように、テキストデータ解析システム１は、第３キーワードと、第２医療行為・医薬品マスタに記憶された各索引文字列との類似度をそれぞれ算出する（Ｓ２０１）（第２類似度算出ステップ）。 In step S172, if the character string (first similar character string) containing the third keyword could not be extracted from the second medical practice/pharmaceutical master and the first medical practice/pharmaceutical master (No), as shown in FIG. Next, the text data analysis system 1 calculates the degree of similarity between the third keyword and each index character string stored in the second medical practice/drug master (S201) (second degree of similarity calculation step).

そして、テキストデータ解析システム１は、第２類似度算出ステップで算出した類似度が所定以上である索引文字列があるかを判定する（Ｓ２０２）。類似度が所定以上である索引文字列がある場合（Ｓ２０２，Ｙｅｓ）、テキストデータ解析システム１は、第２医療行為・医薬品マスタおよび第１医療行為・医薬品マスタから、当該索引文字列に対応する文字列（対応文字列）を抽出する（Ｓ２０３）。 Then, the text data analysis system 1 determines whether there is an index character string whose similarity calculated in the second similarity calculation step is equal to or greater than a predetermined value (S202). If there is an index character string with a degree of similarity greater than or equal to the predetermined value (S202, Yes), the text data analysis system 1 searches for the index character string from the second medical practice/drug master and the first medical practice/drug master. A character string (corresponding character string) is extracted (S203).

その後、テキストデータ解析システム１は、図９のステップＳ１７３に進み、抽出した対応文字列が複数あるかを判定し、１つである場合（Ｓ１７３，Ｎｏ）は、ステップＳ１８３に進み、複数である場合（Ｓ１７３，Ｙｅｓ）は、抽出した対応文字列と、第２キーワードとの類似度をそれぞれ算出し（Ｓ１８１）、以降の処理を実行する。 After that, the text data analysis system 1 proceeds to step S173 in FIG. 9, determines whether or not there are a plurality of extracted corresponding character strings, and if there is one (S173, No), proceeds to step S183 and If so (S173, Yes), the degree of similarity between the extracted corresponding character string and the second keyword is calculated (S181), and the subsequent processes are executed.

図１０に戻り、ステップＳ２０２において、第２類似度算出ステップ（Ｓ２０１）で算出した類似度が所定以上である索引文字列がない場合（Ｓ２０２，Ｎｏ）、テキストデータ解析システム１は、要素文字列マスタを参照して、第２キーワードから、要素文字列を抽出する（Ｓ２１１）（要素文字列抽出ステップ）。 Returning to FIG. 10, in step S202, if there is no index character string for which the degree of similarity calculated in the second degree of similarity calculation step (S201) is equal to or higher than a predetermined value (S202, No), the text data analysis system 1 calculates the element string The master is referred to, and element character strings are extracted from the second keyword (S211) (element character string extraction step).

次に、テキストデータ解析システム１は、抽出した要素文字列が複数あるかを判定する（Ｓ２１２）。抽出した要素文字列が複数ある場合（Ｓ２１２，Ｙｅｓ）、テキストデータ解析システム１は、先に第２類似文字列を抽出する要素文字列、具体的には、第２キーワードの中央に近い位置にある要素文字列を決定し（Ｓ２１３）、決定した要素文字列を第４キーワードとして、第１医療行為・医薬品マスタから、第４キーワードを含む文字列（第２類似文字列）を抽出する（Ｓ２１４）。また、抽出した要素文字列が１つである場合（Ｓ２１２，Ｎｏ）、テキストデータ解析システム１は、当該要素文字列を第４キーワードとして、第１医療行為・医薬品マスタから、第２類似文字列を抽出する（Ｓ２１４）。 Next, the text data analysis system 1 determines whether there are multiple extracted element character strings (S212). If there are a plurality of extracted element character strings (S212, Yes), the text data analysis system 1 first extracts the second similar character string at a position close to the center of the element character string, specifically the second keyword. A certain element character string is determined (S213), and using the determined element character string as the fourth keyword, a character string (second similar character string) containing the fourth keyword is extracted from the first medical practice/pharmaceutical master (S214). ). If only one element character string is extracted (S212, No), the text data analysis system 1 extracts the second similar character string from the first medical practice/pharmaceutical master using the element character string as the fourth keyword. is extracted (S214).

次に、テキストデータ解析システム１は、抽出した第２類似文字列と、第２キーワードとの類似度を算出する（Ｓ２２１）（第３類似度算出ステップ）。次に、テキストデータ解析システム１は、算出した類似度が所定以上である第２類似文字列があるかを判定する（Ｓ２２２）。類似度が所定以上である第２類似文字列がない場合（Ｓ２２２，Ｎｏ）、ステップＳ２１３に戻って、次の、先に第２類似文字列を抽出する要素文字列を決定し、以降の処理を実行する。一方、ステップＳ２２２において、類似度が所定以上である第２類似文字列がある場合（Ｙｅｓ）、テキストデータ解析システム１は、図９のステップ１８２に進み、類似度が所定以上である第２類似文字列を抽出し（Ｓ１８２）、以降の処理を実行する。 Next, the text data analysis system 1 calculates the degree of similarity between the extracted second similar character string and the second keyword (S221) (third degree of similarity calculation step). Next, the text data analysis system 1 determines whether or not there is a second similar character string with a calculated degree of similarity equal to or greater than a predetermined value (S222). If there is no second similar character string with a degree of similarity greater than or equal to the predetermined value (S222, No), the process returns to step S213 to determine the element character string from which the second similar character string is to be extracted first, and the subsequent processing is performed. to run. On the other hand, in step S222, if there is a second similar character string whose degree of similarity is equal to or higher than the predetermined degree (Yes), the text data analysis system 1 proceeds to step 182 in FIG. A character string is extracted (S182), and subsequent processing is executed.

以上の第１実施形態によれば、マスタにキーワードを必要以上に追加していくことなく、テキストデータから医療行為または医薬品を表す文字列を抽出することができる。また、テキストデータから医療行為または医薬品を表す文字列を、厚生労働省が定めた基本マスタに収載された形式に変換して抽出することができる。 According to the first embodiment described above, it is possible to extract a character string representing a medical practice or medicine from text data without adding keywords more than necessary to the master. In addition, character strings representing medical practices or medicines can be extracted from text data by converting them into the format listed in the basic master established by the Ministry of Health, Labor and Welfare.

また、第１類似度算出手段１８をさらに備えることで、医療行為、医薬品の情報を絞り込んで出力することができる。 Moreover, by further providing the first similarity calculation means 18, it is possible to narrow down and output information on medical practices and medicines.

また、第２類似度算出手段２０をさらに備えることで、テキストデータから医療行為または医薬品を表す文字列をより確実に抽出することができる。 Further, by further providing the second similarity calculation means 20, it is possible to more reliably extract character strings representing medical practices or medicines from text data.

また、要素文字列抽出手段２１をさらに備えることで、テキストデータから医療行為または医薬品を表す文字列をさらに確実に抽出することができる。 Further, by further providing the element character string extraction means 21, it is possible to more reliably extract character strings representing medical practices or medicines from text data.

また、第３類似度算出手段２２をさらに備え、先に類似度を算出した第２類似文字列の中に類似度が所定以上である第２類似文字列がある場合に、当該第２類似文字列に対応する傷病名の情報を出力して処理を終了するので、医療行為または医薬品の情報を出力するまでの処理量を少なくして処理速度を速くすることができる。 Further, a third similarity calculation means 22 is further provided, and when there is a second similar character string having a similarity equal to or higher than a predetermined level among the second similar character strings whose similarity has been previously calculated, the second similar character string is calculated. Since the information on the disease name corresponding to the column is output and the processing ends, the amount of processing until the information on the medical practice or medicine is output can be reduced and the processing speed can be increased.

また、第２文字列生成手段１６が、第２キーワードから、括弧および当該括弧によって囲われた文字列を取り除くだけでなく、上記の文字列（１）～（５）をさらに取り除くので、医療行為、医薬品の情報を絞り込みやすくすることができる。 In addition, the second character string generating means 16 not only removes the parentheses and the character strings enclosed by the parentheses from the second keyword, but also removes the above character strings (1) to (5) from the second keyword. , can make it easier to narrow down information on pharmaceuticals.

なお、第１実施形態では、第２文字列生成手段１６は、第２文字列生成ステップで第２キーワードから、上述の文字列（１）～（５）を取り除いたが、第２キーワードから、上述の文字列（１）～（５）の少なくとも１つを取り除く構成であればよい。また、例えば、第２文字列生成手段１６は、第２文字列生成ステップで第２キーワードから、括弧および当該括弧によって囲われた文字列のみを取り除く構成であってもよい。 In the first embodiment, the second character string generation means 16 removes the above-described character strings (1) to (5) from the second keyword in the second character string generation step. Any configuration that removes at least one of the above character strings (1) to (5) is sufficient. Further, for example, the second character string generation means 16 may be configured to remove only parentheses and character strings enclosed by the parentheses from the second keyword in the second character string generation step.

また、第１実施形態では、類似文字列抽出手段１７は、第２医療行為・医薬品マスタおよび第１医療行為・医薬品マスタを参照して、第１類似文字列を抽出したが、例えば、第１医療行為・医薬品マスタのみを参照して、第１類似文字列を抽出する構成であってもよい。すなわち、テキストデータ解析システムは、第２医療行為・医薬品マスタを備えない構成であってもよい。また、第１医療行為・医薬品マスタに、索引文字列を、医療行為・医薬品を表す文字列等と対応させて記憶させておいてもよい。 In the first embodiment, the similar character string extraction means 17 refers to the second medical practice/drug master and the first medical practice/drug master to extract the first similar character string. The first similar character string may be extracted by referring only to the medical practice/medicine master. That is, the text data analysis system may be configured without the second medical practice/drug master. In addition, index character strings may be stored in the first medical practice/drug master in association with character strings representing medical practice/drugs.

また、第１実施形態では、医療行為・医薬品マスタは、点数表区分番号や薬価基準コードを記憶していたが、その他のコード、例えば、医科診療行為マスタの診療行為コードや、医薬品マスタの医薬品コードなどを記憶するものであってもよい。 In addition, in the first embodiment, the medical practice/drug master stores the score table classification number and the drug price standard code, but other codes such as the medical practice code of the medical practice master and the medicines of the drug master It may also store a code or the like.

次に、第２実施形態について説明する。なお、以下では、第１実施形態と異なる点について詳細に説明し、同じ点については同一の要素に同一の符号を付す等して適宜説明を省略する。 Next, a second embodiment will be described. In the following, points different from the first embodiment will be described in detail, and description of the same points will be omitted as appropriate by, for example, assigning the same reference numerals to the same elements.

図１１に示すように、第２実施形態に係るテキストデータ解析システム１は、例えば、診断書等に記載された項目に基づいて作成されたテキストデータから、傷病名に関する項目をＩＣＤ１０対応標準病名マスタに収載された形式に変換して抽出するシステムである。テキストデータ解析システム１は、データ取得手段１１と、文字列抽出手段１２と、第１検索手段２３と、第１文字列生成手段２４と、第２検索手段２５と、第２文字列生成手段２６と、第３検索手段２７と、要素文字列抽出手段２８と、類似文字列抽出手段２９と、類似度算出手段３０と、出力手段３１と、記憶装置９０とを備える。 As shown in FIG. 11, the text data analysis system 1 according to the second embodiment converts items related to disease names from text data created based on items described in a medical certificate, for example, to an ICD 10 compatible standard disease name master. It is a system that converts and extracts the format listed in. The text data analysis system 1 includes data acquisition means 11, character string extraction means 12, first search means 23, first character string generation means 24, second search means 25, and second character string generation means 26. , third search means 27 , element character string extraction means 28 , similar character string extraction means 29 , similarity degree calculation means 30 , output means 31 , and storage device 90 .

第２実施形態において、ＲＯＭや記憶装置９０に記憶させておいたコンピュータプログラムは、テキストデータ解析システム１を構成するコンピュータを、データ取得手段１１と、文字列抽出手段１２と、第１検索手段２３と、第１文字列生成手段２４と、第２検索手段２５と、第２文字列生成手段２６と、第３検索手段２７と、要素文字列抽出手段２８と、類似文字列抽出手段２９と、類似度算出手段３０と、出力手段３１として機能させる。 In the second embodiment, the computer program stored in the ROM or the storage device 90 causes the computer constituting the text data analysis system 1 to perform data acquisition means 11, character string extraction means 12, and first search means 23. , first character string generation means 24, second search means 25, second character string generation means 26, third search means 27, element character string extraction means 28, similar character string extraction means 29, It functions as a similarity calculation means 30 and an output means 31 .

記憶装置９０には、傷病名マスタと、接尾語マスタと、接頭語マスタと、要素文字列マスタとが記憶されている。
図１２に示すように、傷病名マスタは、傷病名を表す文字列を記憶するテーブルとして構成されている。傷病名マスタは、ＩＣＤ１０対応標準病名マスタに収載された傷病名を表す文字列と、ＩＣＤコードとを対応させて記憶している。 The storage device 90 stores an injury or disease name master, a suffix master, a prefix master, and an element character string master.
As shown in FIG. 12, the disease name master is configured as a table that stores character strings representing disease names. In the injury and disease name master, character strings representing injury and disease names listed in the ICD 10 compatible standard disease name master are stored in association with ICD codes.

図１３（ａ）に示すように、接尾語マスタは、文字列の後尾に付く特定の文字列である接尾語を記憶するテーブルとして構成されている。接尾語は、診断書等に記載される傷病名の後尾に付けられることがある、例えば、「の疑い」、「の術後」、「の術前」、「の増悪」、「の治療後」、「の二次感染」等の文字列である。接尾語は、例えば、修飾語マスタに収載された接尾語に使用する修飾語を参考に決めることができる。 As shown in FIG. 13(a), the suffix master is configured as a table that stores suffixes, which are specific character strings attached to the end of character strings. The suffix may be added to the end of the name of the injury or disease described in the medical certificate, etc. "", "Secondary infection of", etc. The suffixes can be determined, for example, by referring to the modifiers used for the suffixes listed in the modifier master.

図１３（ｂ）に示すように、接頭語マスタは、文字列の先頭に付く特定の文字列である接頭語を記憶するテーブルとして構成されている。接頭語は、診断書等に記載される傷病名の先頭に付けられることがある、例えば、「下」、「急性」、「上」、「左」、「左側」、「右」、「右側」等の文字列である。接頭語は、例えば、修飾語マスタに収載された接頭語に使用する修飾語を参考に決めることができる。 As shown in FIG. 13(b), the prefix master is configured as a table that stores prefixes, which are specific character strings attached to the beginning of character strings. Prefixes are sometimes attached to the beginning of the name of an injury or disease described in a medical certificate, etc. ” is a character string. The prefix can be determined, for example, by referring to the modifiers used for the prefixes listed in the modifier master.

図１３（ｃ）に示すように、要素文字列マスタは、要素文字列を記憶するテーブルとして構成されている。要素文字列は、傷病名に含まれる特定の文字列であり、例えば、「真珠」、「腱」、「室」、「耳」、「縮」、「潰」、「中耳」等の文字列である。要素文字列としては、傷病名を絞り込む際にヒットする候補が多くなりすぎないような文字列、例えば、候補を５００件以下程度に絞り込めるような文字列を採用している。 As shown in FIG. 13(c), the element character string master is configured as a table that stores element character strings. The element character string is a specific character string included in the disease name, for example, characters such as "pearl", "tendon", "chamber", "ear", "shrink", "crumple", and "middle ear". column. As the element character string, a character string that does not result in too many hit candidates when narrowing down the disease name, for example, a character string that narrows down the candidates to about 500 or less is used.

図１１に戻り、第１検索手段２３は、文字列抽出手段１２が抽出した文字列を第１キーワードとして、傷病名マスタから、第１キーワードと一致する文字列がヒットするかを検索する。第１検索手段２３は、検索の結果、第１キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する。 Returning to FIG. 11, the first search means 23 uses the character string extracted by the character string extraction means 12 as the first keyword, and searches the disease name master for a character string matching the first keyword. When a character string that matches the first keyword is hit as a result of the search, the first search means 23 outputs information on the disease name corresponding to the hit character string.

ここで、第２実施形態において、第１検索手段２３、第２検索手段２５、第３検索手段２７および出力手段３１は、傷病名の情報として、傷病名とＩＣＤコードの情報を出力する。 Here, in the second embodiment, the first search means 23, the second search means 25, the third search means 27, and the output means 31 output the information of the disease name and the ICD code as the information of the disease name.

第１文字列生成手段２４は、第１検索手段２３による検索の結果、第１キーワードと一致する文字列がヒットしない場合、接尾語マスタを参照して、第１キーワードから、接尾語を取り除いた文字列を生成する。また、第１文字列生成手段２４は、第１キーワードに、接尾語マスタに記憶された接尾語がない場合、生成する文字列を第１キーワードとする。 The first character string generation means 24 refers to the suffix master and removes the suffix from the first keyword when the search result by the first search means 23 does not hit a character string that matches the first keyword. Generate a string. If the first keyword does not have a suffix stored in the suffix master, the first character string generating means 24 uses the generated character string as the first keyword.

第２検索手段２５は、第１文字列生成手段２４が生成した文字列を第２キーワードとして、傷病名マスタから、第２キーワードと一致する文字列がヒットするかを検索する。第２検索手段２５は、検索の結果、第２キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する。 Using the character string generated by the first character string generating unit 24 as a second keyword, the second search means 25 searches the disease name master for a character string that matches the second keyword. When a character string that matches the second keyword is hit as a result of the search, the second search means 25 outputs information on the disease name corresponding to the hit character string.

第２文字列生成手段２６は、第２検索手段２５による検索の結果、第２キーワードと一致する文字列がヒットしない場合、接頭語マスタを参照して、第２キーワードから、接頭語を取り除いた文字列を生成する。 The second character string generating means 26 refers to the prefix master and removes the prefix from the second keyword when the character string matching the second keyword is not hit as a result of the search by the second searching means 25. Generate a string.

なお、第１キーワードと第２キーワードが同じ文字列である場合、第２検索手段２５は、第２キーワードと一致する文字列がヒットするかを検索することなく、第２文字列生成手段２６が、第２キーワードから、接尾語を取り除いた文字列を生成してもよい。 In addition, when the first keyword and the second keyword are the same character string, the second search means 25 does not search whether the character string matching the second keyword is hit, and the second character string generation means 26 , from the second keyword, the suffix may be removed to generate a string.

また、第２文字列生成手段２６は、第２キーワードに、接頭語マスタに記憶された接頭語がない場合、生成する文字列を第２キーワードとする。 If the second keyword does not have the prefix stored in the prefix master, the second character string generating means 26 uses the generated character string as the second keyword.

第３検索手段２７は、第２文字列生成手段２６が生成した文字列を第３キーワードとして、傷病名マスタから、第３キーワードと一致する文字列がヒットするかを検索する。第３検索手段２７は、検索の結果、第３キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する。 The third search means 27 uses the character string generated by the second character string generation means 26 as the third keyword to search the disease name master for a character string that matches the third keyword. When a character string matching the third keyword is found as a result of the search, the third search means 27 outputs information on the disease name corresponding to the hit character string.

要素文字列抽出手段２８は、第３検索手段２７による検索の結果、第３キーワードと一致する文字列がヒットしない場合、要素文字列マスタを参照して、第３キーワードから、少なくとも１つの要素文字列を抽出する。 The element character string extracting means 28 refers to the element character string master and extracts at least one element character from the third keyword when the character string matching the third keyword is not hit as a result of the search by the third searching means 27. Extract columns.

なお、第２キーワードと第３キーワードが同じ文字列である場合、第３検索手段２７は、第３キーワードと一致する文字列がヒットするかを検索することなく、要素文字列抽出手段２８が、第３キーワードから、要素文字列を抽出してもよい。 In addition, when the second keyword and the third keyword are the same character string, the element character string extraction means 28 does not search whether the character string matching the third keyword is hit. Element character strings may be extracted from the third keyword.

類似文字列抽出手段２９は、要素文字列抽出手段２８が抽出した要素文字列を第４キーワードとして、傷病名マスタから、第４キーワードを含む文字列（以下、「類似文字列」ともいう。）を抽出する。 The similar character string extracting means 29 uses the element character string extracted by the element character string extracting means 28 as the fourth keyword, and extracts a character string containing the fourth keyword from the disease name master (hereinafter also referred to as "similar character string"). to extract

類似度算出手段３０は、類似文字列抽出手段２９が抽出した類似文字列と、第３キーワードとの類似度を算出する。一例として、類似度算出手段３０は、レーベンシュタイン距離およびジャロ・ウィンクラー距離の少なくとも一方に基づいて、類似文字列と、第３キーワードとの類似度を算出する。第２実施形態においても、類似度は、０から１の数値として算出される。 The similarity calculator 30 calculates the similarity between the similar character string extracted by the similar character string extractor 29 and the third keyword. As an example, the similarity calculator 30 calculates the similarity between the similar character string and the third keyword based on at least one of the Levenshtein distance and the Jaro-Winkler distance. The degree of similarity is calculated as a numerical value from 0 to 1 in the second embodiment as well.

出力手段３１は、類似文字列抽出手段２９が抽出した類似文字列に対応する少なくとも１つの傷病名の情報を出力する。詳しくは、出力手段３１は、類似文字列抽出手段２９が抽出した類似文字列のうち、類似度算出手段３０が算出した類似度が所定以上である文字列に対応する傷病名の情報を出力する。 The output means 31 outputs information on at least one disease name corresponding to the similar character string extracted by the similar character string extraction means 29 . More specifically, the output means 31 outputs the information of the injury or disease name corresponding to the character string whose similarity calculated by the similarity calculation means 30 is equal to or higher than a predetermined similarity among the similar character strings extracted by the similar character string extraction means 29. .

第２実施形態では、類似文字列抽出手段２９は、要素文字列抽出手段２８が抽出した要素文字列が複数ある場合、第３キーワードの中央に近い位置にある要素文字列（以下、第２実施形態において「優先要素文字列」ともいう。）を、まず、第４キーワードとして、傷病名マスタから、類似文字列を抽出する。 In the second embodiment, when there are a plurality of element character strings extracted by the element character string extraction means 28, the similar character string extraction means 29 extracts an element character string located near the center of the third keyword (hereinafter referred to as the second embodiment). Also referred to as "priority element character string" in the form.) is first used as the fourth keyword, and similar character strings are extracted from the disease name master.

類似度算出手段３０は、優先要素文字列を第４キーワードとして類似文字列抽出手段２９が抽出した類似文字列について、他の要素文字列よりも先に第３キーワードとの類似度を算出する。 The similarity calculation means 30 calculates the similarity of the similar character string extracted by the similar character string extraction means 29 using the priority element character string as the fourth keyword to the third keyword before other element character strings are calculated.

出力手段３１は、先に類似度を算出した類似文字列の中に類似度が所定以上である文字列がある場合、当該類似文字列に対応する傷病名の情報を出力する。出力手段３１が、先に類似度を算出した類似文字列に対応する傷病名の情報を出力した場合、それ以降、類似文字列抽出手段２９は、他の要素文字列について類似文字列を抽出せず、類似度算出手段３０は、類似度を算出しない。 If there is a character string with a degree of similarity equal to or greater than a predetermined degree among the similar character strings whose degree of similarity has been previously calculated, the output unit 31 outputs information on the disease name corresponding to the similar character string. When the output means 31 outputs the information of the injury or disease name corresponding to the similar character string for which the degree of similarity has been previously calculated, the similar character string extraction means 29 thereafter extracts similar character strings for other element character strings. First, the similarity calculation means 30 does not calculate the similarity.

類似文字列抽出手段２９は、先に類似度を算出した類似文字列の中に類似度が所定以上である文字列がない場合、次に第３キーワードの中央に近い位置にある要素文字列を第４キーワードとして、傷病名マスタから、類似文字列を抽出し、類似度算出手段３０は、抽出した類似文字列について第３キーワードとの類似度を算出する。そして、出力手段３１は、類似度を算出した類似文字列の中に類似度が所定以上である文字列がある場合、当該類似文字列に対応する傷病名の情報を出力する。 If there is no character string with a degree of similarity equal to or higher than a predetermined degree among the similar character strings whose degree of similarity has been previously calculated, the similar character string extraction means 29 extracts the element character string located near the center of the third keyword. As the fourth keyword, a similar character string is extracted from the disease name master, and the similarity calculation means 30 calculates the similarity of the extracted similar character string to the third keyword. Then, if there is a character string with a predetermined degree of similarity or more among the similar character strings for which the degree of similarity has been calculated, the output means 31 outputs information on the disease name corresponding to the similar character string.

要素文字列抽出手段２８が抽出した要素文字列が複数ある場合において、２つの要素文字列の第３キーワードの中央からの位置が同じである場合、類似文字列抽出手段２９は、各要素文字列を第４キーワードとして、傷病名マスタから、類似文字列をそれぞれ抽出し、抽出した類似文字列について第３キーワードとの類似度をそれぞれ算出する。そして、出力手段３１は、類似度を算出した類似文字列の中に類似度が所定以上である文字列がある場合、当該類似文字列に対応する傷病名の情報を出力する。なお、この場合、抽出した類似文字列の数が少ない方について、多い方よりも先に類似度を算出し、先に類似度を算出した類似文字列の中に類似度が所定以上である文字列がある場合、当該類似文字列に対応する傷病名の情報を出力して、処理を終了してもよい。 When there are a plurality of element character strings extracted by the element character string extracting means 28, if the two element character strings have the same position from the center of the third keyword, the similar character string extracting means 29 extracts each element character string as the fourth keyword, similar character strings are extracted from the disease name master, and the degree of similarity between the extracted similar character strings and the third keyword is calculated. Then, if there is a character string with a predetermined degree of similarity or more among the similar character strings for which the degree of similarity has been calculated, the output means 31 outputs information on the disease name corresponding to the similar character string. In this case, the degree of similarity is calculated for the extracted similar character strings with a smaller number of characters before those with a larger number of extracted similar character strings, and among the similar character strings for which the degree of similarity is calculated first, characters with a degree of similarity equal to or greater than a predetermined If there is a string, the information of the disease name corresponding to the similar character string may be output, and the process may be terminated.

ここで、具体的な例を示しながら、第２実施形態のテキストデータ解析システム１における処理について説明する。
第１の例として、図１４（ａ）に示すように、第１検索手段２３は、データ取得手段１１が取得し、文字列抽出手段１２が抽出した文字列“急性インフルエンザ肺炎の疑い”を第１キーワードとして、傷病名マスタから、第１キーワードと一致する文字列がヒットするかを検索する。 Here, the processing in the text data analysis system 1 of the second embodiment will be described while showing a specific example.
As a first example, as shown in FIG. 14( a ), the first search unit 23 retrieves the character string “suspicious of acute influenza pneumonia” acquired by the data acquisition unit 11 and extracted by the character string extraction unit 12 as the first As one keyword, a search is performed from the disease name master to see if there is a hit for a character string that matches the first keyword.

第１文字列生成手段２４は、第１検索手段２３による検索の結果、第１キーワードと一致する文字列がヒットしない場合、図１４（ｂ）に示すように、接尾語マスタを参照して、第１キーワードから、接尾語“の疑い”を取り除いた文字列“急性インフルエンザ肺炎”を生成する。 If the first search means 23 does not find a hit of a character string that matches the first keyword, the first character string generation means 24 refers to the suffix master as shown in FIG. From the first keyword, a character string "acute influenza pneumonia" is generated by removing the suffix "suspicious".

第２検索手段２５は、第１文字列生成手段２４が生成した文字列“急性インフルエンザ肺炎”を第２キーワードとして、傷病名マスタから、第２キーワードと一致する文字列がヒットするかを検索する。 The second search means 25 uses the character string "acute influenza pneumonia" generated by the first character string generation means 24 as a second keyword, and searches the disease name master for a character string that matches the second keyword. .

第２文字列生成手段２６は、第２検索手段２５による検索の結果、第２キーワードと一致する文字列がヒットしない場合、図１４（ｃ）に示すように、接頭語マスタを参照して、第２キーワードから、接頭語“急性”を取り除いた文字列“インフルエンザ肺炎”を生成する。 If the second search means 25 does not find a hit of a character string that matches the second keyword, the second character string generation means 26 refers to the prefix master as shown in FIG. From the second keyword, a character string "influenza pneumonia" is generated by removing the prefix "acute".

第３検索手段２７は、第２文字列生成手段２６が生成した文字列“インフルエンザ肺炎”を第３キーワードとして、傷病名マスタから、第３キーワードと一致する文字列がヒットするかを検索する。第３検索手段２７は、検索の結果、第３キーワードと一致する文字列がヒットした場合、ヒットした文字列に対応する傷病名の情報を出力する。具体的には、第３検索手段２７は、図１４（ｄ）に示すように、傷病名「インフルエンザ肺炎」とＩＣＤコード「Ｊ１１０」の情報を出力する。 The third search means 27 uses the character string “influenza pneumonia” generated by the second character string generation means 26 as the third keyword, and searches the disease name master for a character string matching the third keyword. When a character string matching the third keyword is found as a result of the search, the third search means 27 outputs information on the disease name corresponding to the hit character string. Specifically, as shown in FIG. 14(d), the third search means 27 outputs the information of the disease name "influenza pneumonia" and the ICD code "J110".

また、第２の例として、図１５（ａ）に示すように、第１検索手段２３は、データ取得手段１１が取得し、文字列抽出手段１２が抽出した文字列“右真珠性中耳炎の術後”を第１キーワードとして、傷病名マスタから、第１キーワードと一致する文字列がヒットするかを検索する。 As a second example, as shown in FIG. 15( a ), the first search means 23 retrieves the character string “Right otitis media pearalis” acquired by the data acquisition means 11 and extracted by the character string extraction means 12 . Using "after" as the first keyword, a search is performed from the disease name master to see if there is a hit for a character string that matches the first keyword.

第１文字列生成手段２４は、第１検索手段２３による検索の結果、第１キーワードと一致する文字列がヒットしない場合、図１５（ｂ）に示すように、接尾語マスタを参照して、第１キーワードから、接尾語“の術後”を取り除いた文字列“右真珠性中耳炎”を生成する。 When the first search means 23 does not find a hit of a character string that matches the first keyword, the first character string generation means 24 refers to the suffix master as shown in FIG. From the first keyword, the character string "right otitis media" is generated by removing the suffix "no postoperative".

第２検索手段２５は、第１文字列生成手段２４が生成した文字列“右真珠性中耳炎”を第２キーワードとして、傷病名マスタから、第２キーワードと一致する文字列がヒットするかを検索する。 The second search means 25 uses the character string "right otitis media" generated by the first character string generation means 24 as a second keyword to search the disease name master for a character string matching the second keyword. do.

第２文字列生成手段２６は、第２検索手段２５による検索の結果、第２キーワードと一致する文字列がヒットしない場合、図１５（ｃ）に示すように、接頭語マスタを参照して、第２キーワードから、接頭語“右”を取り除いた文字列“真珠性中耳炎”を生成する。 If the second search means 25 does not find a hit of a character string that matches the second keyword, the second character string generation means 26 refers to the prefix master as shown in FIG. From the second keyword, generate the character string "Pearls" with the prefix "Right" removed.

第３検索手段２７は、第２文字列生成手段２６が生成した文字列“真珠性中耳炎”を第３キーワードとして、傷病名マスタから、第３キーワードと一致する文字列がヒットするかを検索する。 The third search means 27 uses the character string "pearl otitis media" generated by the second character string generation means 26 as the third keyword, and searches the disease name master for a character string matching the third keyword. .

要素文字列抽出手段２８は、第３検索手段２７による検索の結果、第３キーワードと一致する文字列がヒットしない場合、図１５（ｄ）に示すように、要素文字列マスタを参照して、第３キーワードから、要素文字列“真珠”および“中耳”を抽出する。 When the search by the third search means 27 does not find a hit of a string matching the third keyword, the element string extraction means 28 refers to the element string master as shown in FIG. 15(d), Element character strings "pearl" and "middle ear" are extracted from the third keyword.

類似文字列抽出手段２９は、要素文字列抽出手段２８が抽出した要素文字列が複数あるので、第３キーワードの中央に近い位置にある要素文字列“中耳”を、まず、第４キーワードとして、傷病名マスタから、図１６（ａ）に示すような、第４キーワード“中耳”を含む文字列（類似文字列）を抽出する。 Since there are a plurality of element character strings extracted by the element character string extraction means 28, the similar character string extraction means 29 first extracts the element character string "middle ear" located near the center of the third keyword as the fourth keyword. , a character string (similar character string) containing the fourth keyword "middle ear" as shown in FIG. 16(a) is extracted from the disease name master.

図１６（ｂ）に示すように、類似度算出手段３０は、抽出した類似文字列について、第３キーワード“真珠性中耳炎”との類似度をそれぞれ算出する。 As shown in FIG. 16(b), the degree-of-similarity calculation means 30 calculates the degree of similarity between the extracted similar character strings and the third keyword "otitis media pearl".

出力手段３１は、類似文字列抽出手段２９が抽出した類似文字列のうち、類似度算出手段３０が算出した類似度が所定値、例えば、０．８以上である類似文字列に対応する傷病名の情報を出力する。具体的には、出力手段３１は、傷病名「真珠腫性中耳炎」とＩＣＤコード「Ｈ７１」、傷病名「急性中耳炎」とＩＣＤコード「Ｈ６６９」、および、傷病名「慢性中耳炎」とＩＣＤコード「Ｈ６６９」の情報を出力する。なお、出力手段３１は、類似度算出手段３０が算出した類似度の情報をさらに出力する構成であってもよい。 The output means 31 outputs disease names corresponding to similar character strings extracted by the similar character string extraction means 29 and having a similarity calculated by the similarity calculation means 30 equal to or higher than a predetermined value, for example, 0.8. information. Specifically, the output means 31 outputs the disease name "cholesteatoma media" and the ICD code "H71", the disease name "acute otitis media" and the ICD code "H669", and the disease name "chronic otitis media" and the ICD code "H71". H669" information is output. Note that the output means 31 may be configured to further output similarity information calculated by the similarity calculation means 30 .

次に、第２実施形態のテキストデータ解析システム１の動作の一例について、フローチャートを参照しながら説明する。 Next, an example of the operation of the text data analysis system 1 of the second embodiment will be described with reference to a flowchart.

図１７に示すように、テキストデータ解析システム１は、文字列抽出ステップ（Ｓ１２０）で抽出した文字列を第１キーワードとして、傷病名マスタから、第１キーワードと一致する文字列がヒットするかを検索する（Ｓ２３１）（第１検索ステップ）。そして、検索の結果、第１キーワードと一致する文字列がヒットした場合（Ｓ２３２，Ｙｅｓ）、ステップＳ３０４に進む。 As shown in FIG. 17, the text data analysis system 1 uses the character string extracted in the character string extraction step (S120) as the first keyword, and checks whether a character string matching the first keyword is hit from the disease name master. Search (S231) (first search step). If a character string matching the first keyword is hit as a result of the search (S232, Yes), the process proceeds to step S304.

一方、第１検索ステップにおける検索の結果、第１キーワードと一致する文字列がヒットしない場合（Ｓ２３２，Ｎｏ）、テキストデータ解析システム１は、接尾語マスタを参照して、第１キーワードから、接尾語を取り除いた文字列を生成する（Ｓ２４０）（第１文字列生成ステップ）。 On the other hand, if the search result in the first search step does not hit a character string that matches the first keyword (S232, No), the text data analysis system 1 refers to the suffix master and extracts the suffix from the first keyword. A character string with words removed is generated (S240) (first character string generation step).

次に、テキストデータ解析システム１は、第１文字列生成ステップで生成した文字列を第２キーワードとして、傷病名マスタから、第２キーワードと一致する文字列がヒットするかを検索する（Ｓ２５１）（第２検索ステップ）。そして、検索の結果、第２キーワードと一致する文字列がヒットした場合（Ｓ２５２，Ｙｅｓ）、ステップＳ３０４に進む。 Next, the text data analysis system 1 uses the character string generated in the first character string generating step as the second keyword, and searches the disease name master for a character string that matches the second keyword (S251). (Second search step). Then, if a character string that matches the second keyword is hit as a result of the search (S252, Yes), the process proceeds to step S304.

一方、第２検索ステップにおける検索の結果、第２キーワードと一致する文字列がヒットしない場合（Ｓ２５２，Ｎｏ）、テキストデータ解析システム１は、接頭語マスタを参照して、第２キーワードから、接頭語を取り除いた文字列を生成する（Ｓ２６０）（第２文字列生成ステップ）。 On the other hand, if the search result in the second search step does not hit a character string that matches the second keyword (S252, No), the text data analysis system 1 refers to the prefix master and extracts the prefix from the second keyword. A character string with words removed is generated (S260) (second character string generation step).

次に、テキストデータ解析システム１は、第２文字列生成ステップで生成した文字列を第３キーワードとして、傷病名マスタから、第３キーワードと一致する文字列がヒットするかを検索する（Ｓ２７１）（第３検索ステップ）。そして、検索の結果、第３キーワードと一致する文字列がヒットした場合（Ｓ２７２，Ｙｅｓ）、ステップＳ３０４に進む。 Next, the text data analysis system 1 uses the character string generated in the second character string generating step as the third keyword, and searches the disease name master for a character string that matches the third keyword (S271). (third search step). As a result of the search, if a character string matching the third keyword is hit (S272, Yes), the process proceeds to step S304.

一方、第３検索ステップにおける検索の結果、第３キーワードと一致する文字列がヒットしない場合（Ｓ２７２，Ｎｏ）、テキストデータ解析システム１は、要素文字列マスタを参照して、第３キーワードから、要素文字列を抽出する（Ｓ２８０）（要素文字列抽出ステップ）。 On the other hand, if the search result in the third search step does not hit a character string that matches the third keyword (S272, No), the text data analysis system 1 refers to the element character string master, and from the third keyword, Element character strings are extracted (S280) (element character string extraction step).

次に、テキストデータ解析システム１は、抽出した要素文字列が複数あるかを判定する（Ｓ２９１）。抽出した要素文字列が複数ある場合（Ｓ２９１，Ｙｅｓ）、テキストデータ解析システム１は、先に類似文字列を抽出する要素文字列、具体的には、第３キーワードの中央に近い位置にある要素文字列を決定し（Ｓ２９２）、決定した要素文字列を第４キーワードとして、傷病名マスタから、第４キーワードを含む文字列（類似文字列）を抽出する（Ｓ２９３）。また、抽出した要素文字列が１つである場合（Ｓ２９１，Ｎｏ）、テキストデータ解析システム１は、当該要素文字列を第４キーワードとして、傷病名マスタから、類似文字列を抽出する（Ｓ２９３）（類似文字列抽出ステップ）。 Next, the text data analysis system 1 determines whether there are multiple extracted element character strings (S291). If there are a plurality of extracted element strings (S291, Yes), the text data analysis system 1 first extracts similar character strings, specifically, the element near the center of the third keyword. A character string is determined (S292), and character strings (similar character strings) containing the fourth keyword are extracted from the disease name master using the determined element character string as the fourth keyword (S293). If only one element character string is extracted (S291, No), the text data analysis system 1 extracts similar character strings from the disease name master using the element character string as the fourth keyword (S293). (similar string extraction step).

次に、テキストデータ解析システム１は、抽出した類似文字列と、第３キーワードとの類似度を算出する（Ｓ３０１）（類似度算出ステップ）。次に、テキストデータ解析システム１は、算出した類似度が所定以上である類似文字列があるかを判定する（Ｓ３０２）。類似度が所定以上である類似文字列がない場合（Ｓ３０２，Ｎｏ）、ステップＳ２９２に戻って、次の、先に類似文字列を抽出する要素文字列を決定し、以降の処理を実行する。一方、ステップＳ３０２において、類似度が所定以上である類似文字列がある場合（Ｙｅｓ）、テキストデータ解析システム１は、類似度が所定以上である類似文字列を抽出し（Ｓ３０３）、ステップＳ３０４に進む。 Next, the text data analysis system 1 calculates the degree of similarity between the extracted similar character string and the third keyword (S301) (similarity degree calculation step). Next, the text data analysis system 1 determines whether there is a similar character string with a calculated degree of similarity greater than or equal to a predetermined value (S302). If there is no similar character string with a degree of similarity greater than or equal to the predetermined value (S302, No), the process returns to step S292 to determine the element character string from which the next similar character string is to be extracted first, and the subsequent processing is executed. On the other hand, in step S302, if there is a similar character string with a degree of similarity equal to or greater than the predetermined degree (Yes), the text data analysis system 1 extracts a similar character string with a degree of similarity equal to or greater than the predetermined degree (S303), and proceeds to step S304. move on.

ステップＳ３０４において、テキストデータ解析システム１は、次の、一の項目を表す一群の文字列があるかを判定する。そして、次の文字列がある場合（Ｓ３０４，Ｙｅｓ）、ステップＳ１２０に戻って以降の処理を実行し、次の文字列がない場合（Ｓ３０４，Ｎｏ）、ステップＳ３１１に進む。 In step S304, the text data analysis system 1 determines whether there is a group of character strings representing the next item. If there is a next character string (S304, Yes), the process returns to step S120 to execute subsequent processing, and if there is no next character string (S304, No), the process proceeds to step S311.

ステップＳ３１１において、テキストデータ解析システム１は、第１検索ステップ、第２検索システムまたは第３検索ステップでヒットした文字列に対応する傷病名の情報を抽出し、抽出した傷病名の情報を出力する（Ｓ３１２）。または、テキストデータ解析システム１は、類似文字列抽出ステップで抽出した類似文字列に対応する少なくとも１つの傷病名の情報を抽出し（Ｓ３１１）、抽出した傷病名の情報を出力する（Ｓ３１２）（出力ステップ）。 In step S311, the text data analysis system 1 extracts the information of the disease name corresponding to the character string hit in the first search step, the second search system, or the third search step, and outputs the information of the extracted disease name. (S312). Alternatively, the text data analysis system 1 extracts information on at least one disease name corresponding to the similar character string extracted in the similar character string extraction step (S311), and outputs information on the extracted disease name (S312) ( output step).

以上の第２実施形態によれば、マスタにキーワードを必要以上に追加していくことなく、テキストデータから傷病名を表す文字列を抽出することができる。また、テキストデータから傷病名を表す文字列を、ＩＣＤ１０対応標準病名マスタに収載された形式に変換して抽出することができる。 According to the second embodiment described above, it is possible to extract a character string representing an injury or disease name from text data without adding keywords more than necessary to the master. In addition, it is possible to extract a character string representing an injury or disease name from text data by converting it into a format listed in the standard disease name master corresponding to ICD10.

また、要素文字列抽出手段２８、類似文字列抽出手段２９および出力手段３１をさらに備えることで、テキストデータから傷病名を表す文字列をより確実に抽出することができる。 Moreover, by further providing the element character string extraction means 28, the similar character string extraction means 29, and the output means 31, the character string representing the disease name can be extracted more reliably from the text data.

また、類似度算出手段３０をさらに備えることで、傷病名の情報を絞り込んで出力することができる。 Moreover, by further providing the similarity calculation means 30, it is possible to narrow down and output the information on the disease name.

また、先に類似度を算出した類似文字列の中に類似度が所定以上である類似文字列がある場合に、当該類似文字列に対応する傷病名の情報を出力して処理を終了するので、傷病名の情報を出力するまでの処理量を少なくして処理速度を速くすることができる。 In addition, if there is a similar character string whose similarity is equal to or higher than a predetermined similarity among the similar character strings whose similarity has been calculated in advance, the information of the disease name corresponding to the similar character string is output and the process is terminated. , the processing speed can be increased by reducing the amount of processing up to the output of the information on the disease name.

なお、第２実施形態では、傷病名マスタは、ＩＣＤコードを記憶していたが、その他のコード、例えば、ＩＣＤ１０対応標準病名マスタの病名管理番号などを記憶するものであってもよい。また、第２実施形態では、傷病名マスタは、ＩＣＤ１０対応標準病名マスタに基づいて作成されていたが、例えば、基本マスタの傷病名マスタなどに基づいて作成してもよい。 In the second embodiment, the injury and disease name master stores ICD codes, but may store other codes such as disease name management numbers of standard disease name master corresponding to ICD10. In the second embodiment, the injury and disease name master is created based on the ICD 10 compatible standard disease name master, but it may be created based on the injury and disease name master of the basic master, for example.

以上、実施形態について説明したが、本発明は前記実施形態に限定されることなく、以下に例示するように適宜変形して実施することができる。 Although the embodiments have been described above, the present invention is not limited to the above-described embodiments, and can be implemented with appropriate modifications as exemplified below.

例えば、前記実施形態では、抽出された要素文字列が複数ある場合、所定のキーワードの中央に近い位置にある要素文字列について、他の要素文字列よりも先に類似度を算出したが、これに限定されない。例えば、抽出された要素文字列が複数ある場合、抽出された各要素文字列について、それぞれ類似文字列を抽出し、抽出した類似文字列の数が他よりも少ないものについて、先に類似度を算出し、先に類似度を算出した類似文字列の中に類似度が所定以上である文字列がある場合、当該類似文字列に対応する医療行為等の情報を出力して、処理を終了してもよい。また、要素文字列マスタに、要素文字列と、類似度を算出する際の優先順位を表す数字等とを対応させて記憶させておいてもよい。 For example, in the above-described embodiment, when there are a plurality of extracted element strings, the degree of similarity is calculated for an element string positioned near the center of a predetermined keyword before other element strings are calculated. is not limited to For example, if there are multiple extracted element strings, similar strings are extracted for each extracted element string, and similar strings are first extracted for those with fewer similar strings than others. If there is a character string whose degree of similarity is equal to or higher than the predetermined degree among the similar character strings whose degree of similarity has been calculated, output information such as medical practice corresponding to the similar character string, and terminate the process. may Further, the element character string may be stored in association with a number or the like representing a priority order when calculating the degree of similarity in the element character string master.

また、前記実施形態では、類似度算出手段が、類似文字列抽出手段が抽出した類似文字列と、所定のキーワードとの類似度を算出し、出力手段が、類似度算出手段が算出した類似度が所定以上である類似文字列に対応する情報を出力したが、これに限定されない。例えば、出力手段は、類似文字列抽出手段が抽出した類似文字列の数が所定以下であれば、類似度を算出することなく、類似文字列抽出手段が抽出した類似文字列に対応する情報を出力する構成であってもよい。 Further, in the above embodiment, the similarity calculation means calculates the similarity between the similar character string extracted by the similar character string extraction means and the predetermined keyword, and the output means outputs the similarity calculated by the similarity calculation means. Although the information corresponding to the similar character string with the . For example, if the number of similar character strings extracted by the similar character string extraction means is equal to or less than a predetermined number, the output means outputs information corresponding to the similar character strings extracted by the similar character string extraction means without calculating the degree of similarity. It may be configured to output.

また、前記した実施形態および変形例で説明した各要素は、任意に組み合わせて実施してもよい。また、例えば、第１実施形態と第２実施形態を組み合わせて実施する場合、第１実施形態の接頭語マスタと第２実施形態の接頭語マスタは共通のマスタとしてもよいし、第１実施形態の要素文字列マスタと第２実施形態の要素文字列マスタは共通のマスタとしてもよい。 Moreover, each element described in the above-described embodiment and modifications may be implemented in any combination. Further, for example, when the first embodiment and the second embodiment are implemented in combination, the prefix master of the first embodiment and the prefix master of the second embodiment may be a common master. and the element string master of the second embodiment may be a common master.

１テキストデータ解析システム
１１データ取得手段
１２文字列抽出手段
１３第１検索手段
１４第１文字列生成手段
１５第２検索手段
１６第２文字列生成手段
１７類似文字列抽出手段
１８第１類似度算出手段
１９出力手段
２０第２類似度算出手段
２１要素文字列算出手段
２２第３類似度算出手段
２３第１検索手段
２４第１文字列生成手段
２５第２検索手段
２６第２文字列生成手段
２７第３検索手段
２８要素文字列抽出手段
２９類似文字列抽出手段
３０類似度算出手段
３１出力手段 1 text data analysis system 11 data acquisition means 12 character string extraction means 13 first search means 14 first character string generation means 15 second search means 16 second character string generation means 17 similar character string extraction means 18 first similarity calculation Means 19 Output Means 20 Second Similarity Calculating Means 21 Element Character String Calculating Means 22 Third Similarity Calculating Means 23 First Searching Means 24 First Character String Generating Means 25 Second Searching Means 26 Second Character String Generating Means 27 Second 3 search means 28 element character string extraction means 29 similar character string extraction means 30 similarity calculation means 31 output means

Claims

data acquisition means for acquiring text data;
Character string extraction means for extracting a group of character strings representing one item from the text data acquired by the data acquisition means;
Using the character string extracted by the character string extracting means as a first keyword, a medical practice/drug master storing character strings representing medical practices or medicines is searched for hits for a character string matching the first keyword. a first search means for outputting information on medical practices or pharmaceuticals corresponding to the hit character string when a character string matching the first keyword is hit as a result of the search;
If the search by the first search means does not find a hit for a character string that matches the first keyword, the prefix master that stores a prefix that is a specific character string attached to the beginning of the character string is referred to. a first character string generating means for generating a character string by removing the prefix from the first keyword;
A second search means for searching whether a character string matching the second keyword is hit from the medical practice/pharmaceutical master using the character string generated by the first character string generation means as a second keyword, a second search means for outputting information on medical practices or pharmaceuticals corresponding to the hit character string when a character string matching the second keyword is hit as a result of the search;
If a search by the second search means does not hit a character string that matches the second keyword, a character string is generated by removing at least the parenthesis and the character string enclosed by the parentheses from the second keyword. a second character string generating means;
Similar character string extraction means for extracting a character string containing the third keyword from the medical practice/drug master, using the character string generated by the second character string generation means as the third keyword;
and output means for outputting information on at least one medical practice or drug corresponding to the character string extracted by the similar character string extraction means.

The method further comprises first similarity calculating means for calculating, when there are a plurality of character strings extracted by the similar character string extracting means, a similarity between the character string extracted by the similar character string extracting means and the second keyword. ,
The output means outputs information on a medical practice or medicine corresponding to a character string, among the character strings extracted by the similar character string extraction means, for which the degree of similarity calculated by the first degree of similarity calculation means is equal to or higher than a predetermined value. The text data analysis system according to claim 1, characterized by:

The medical practice/pharmaceutical master associates a character string representing one or more medical practices or pharmaceuticals with an index string that is a character string included in the character strings representing the one or more medical practices or pharmaceuticals. I remember
If the similar character string extraction means fails to extract a character string containing the third keyword from the medical practice/pharmaceutical master, the text data analysis system calculates the degree of similarity between the third keyword and the index character string. Further comprising a second similarity calculation means for calculating
The similar character string extraction means extracts a character string corresponding to the index character string from the medical practice/pharmaceutical master when there is an index character string with a similarity calculated by the second similarity calculation means equal to or greater than a predetermined value. 3. The text data analysis system according to claim 1, wherein the text data analysis system extracts.

An element character string for storing an element character string that is a specific character string included in a character string representing a medical practice or a drug when there is no index character string for which the degree of similarity calculated by the second similarity calculation means is equal to or greater than a predetermined value. Referencing the master, further comprising element string extraction means for extracting at least one element string from the second keyword,
The similar character string extracting means is characterized in that, using the element character string extracted by the element character string extracting means as a fourth keyword, character strings containing the fourth keyword are extracted from the medical practice/drug master. The text data analysis system according to claim 3.

When there are a plurality of element character strings extracted by the element character string extraction means, the character string extracted by the similar character string extraction means using the element character string located near the center of the second keyword as the fourth keyword is Further comprising a third similarity calculation means for calculating the similarity with the second keyword before the element character string of
If there is a character string whose degree of similarity is equal to or greater than a predetermined value among the character strings whose degree of similarity has been previously calculated by the third degree of similarity calculation unit, the output means outputs information on medical practices or pharmaceuticals corresponding to the character string. 5. The text data analysis system according to claim 4, which outputs:

The second character string generating means generates a character string by removing at least one of the following character strings (1) to (5) from the second keyword. 6. The text data analysis system according to any one of 5.
(1) Blanks at the beginning or end (2) Blanks in the middle and the character string after the blank (3) Middle black (4) Comma marks (5) Numbers and character strings immediately after the numbers that represent units

data acquisition means for acquiring text data;
Character string extraction means for extracting a group of character strings representing one item from the text data acquired by the data acquisition means;
Using the character string extracted by the character string extracting means as a first keyword, a first search means searches for a character string that matches the first keyword from an injury or disease name master storing character strings representing injury or disease names. a first search means for outputting, when a character string matching the first keyword is hit as a result of the search, information on an injury or disease name corresponding to the hit character string;
If the search by the first search means does not find a hit for a character string that matches the first keyword, the suffix master storing a suffix that is a specific character string attached to the end of the character string is referred to. a first character string generating means for generating a character string by removing the suffix from the first keyword;
A second search means for searching whether a character string matching the second keyword is hit from the disease name master using the character string generated by the first character string generation means as a second keyword, As a result, when a character string that matches the second keyword is hit, a second search means for outputting information on the disease name corresponding to the hit character string;
If the search by the second search means does not find a hit for a character string that matches the second keyword, the prefix master that stores a prefix that is a specific character string attached to the beginning of the character string is referred to. a second character string generating means for generating a character string with the prefix removed from the second keyword;
A third search means for searching for a hit of a character string matching the third keyword from the disease name master using the character string generated by the second character string generation means as a third keyword, A text data analysis system, comprising: a third search means for outputting information of an injury or disease name corresponding to the hit character string when a character string matching the third keyword is hit as a result.

If a character string matching the third keyword is not hit as a result of the search by the third search means, referring to an element character string master that stores an element character string that is a specific character string included in the disease name, Element character string extracting means for extracting at least one element character string from the third keyword;
Similar character string extraction means for extracting character strings containing the fourth keyword from the disease name master, using the element character string extracted by the element character string extraction means as a fourth keyword;
8. The text data analysis system according to claim 7, further comprising output means for outputting information of at least one disease name corresponding to the character string extracted by said similar character string extraction means.

Further comprising a similarity calculation means for calculating a similarity between the character string extracted by the similar character string extraction means and the third keyword,
The output means is characterized in that, among the character strings extracted by the similar character string extracting means, the information of disease names corresponding to character strings for which the degree of similarity calculated by the similarity degree calculating means is equal to or higher than a predetermined value is outputted. The text data analysis system according to claim 8.

When there are a plurality of element character strings extracted by the element character string extraction means, the similarity degree calculation means causes the similar character string extraction means to use the element character string located near the center of the third keyword as a fourth keyword. For the extracted character string, calculating the similarity with the third keyword before other element character strings,
4. The output means outputs the information of the injury or disease name corresponding to the character string whose degree of similarity is equal to or higher than a predetermined value among the character strings whose degree of similarity has been calculated in advance. 9. The text data analysis system according to 9.

The computer has the means to
a data acquisition step for acquiring text data;
a character string extraction step of extracting a group of character strings representing one item from the text data acquired in the data acquisition step;
Using the character string extracted in the character string extraction step as a first keyword, a medical practice/drug master storing character strings representing medical practices or medicines is searched for hits for a character string matching the first keyword. a first search step of outputting information on medical practices or pharmaceuticals corresponding to the hit character string when a character string matching the first keyword is hit as a result of the search;
If the search result in the first search step does not hit a character string that matches the first keyword, the prefix master that stores a prefix that is a specific character string attached to the beginning of the character string is referred to. a first character string generating step of generating a character string with the prefix removed from the first keyword;
A second search step of searching for a hit of a character string matching the second keyword from the medical practice/pharmaceutical master using the character string generated in the first character string generation step as a second keyword, a second search step of outputting information on medical practices or pharmaceuticals corresponding to the hit character string when the search results in a character string matching the second keyword;
If the search result in the second search step does not hit a character string that matches the second keyword, a character string is generated by removing at least the parenthesis and the character string enclosed by the parentheses from the second keyword. a second string generating step;
A similar character string extraction step of extracting a character string containing the third keyword from the medical practice/pharmaceutical master, using the character string generated in the second character string generation step as a third keyword;
and an output step of outputting information on at least one medical practice or drug corresponding to the character string extracted in the similar character string extraction step.

The computer has the means to
a data acquisition step for acquiring text data;
a character string extraction step of extracting a group of character strings representing one item from the text data acquired in the data acquisition step;
Using the character string extracted in the character string extraction step as a first keyword, a first search step of searching an injury or disease name master storing character strings representing injury or disease names for a character string matching the first keyword. a first retrieving step of outputting information on an injury or disease name corresponding to the hit character string when a character string matching the first keyword is hit as a result of the search;
If the search result in the first search step does not hit a character string that matches the first keyword, the suffix master that stores a suffix that is a specific character string attached to the end of the character string is referred to. a first character string generating step of generating a character string from the first keyword with the suffix removed;
A second search step of searching for a hit of a character string matching the second keyword from the disease name master using the character string generated in the first character string generation step as a second keyword, As a result, when a character string matching the second keyword is hit, a second search step of outputting information on the disease name corresponding to the hit character string;
If the result of the search in the second search step does not hit a character string that matches the second keyword, the prefix master that stores a prefix that is a specific character string attached to the beginning of the character string is referred to. a second character string generating step of generating a character string from the second keyword with the prefix removed;
A third search step of searching for a hit of a character string matching the third keyword from the disease name master using the character string generated in the second character string generation step as a third keyword, A text data analysis method characterized by: executing a third search step of outputting information on an injury or disease name corresponding to the hit character string when a character string matching the third keyword is hit as a result.

the computer,
data acquisition means for acquiring text data;
Character string extraction means for extracting a group of character strings representing one item from the text data acquired by the data acquisition means;
Using the character string extracted by the character string extracting means as a first keyword, a medical practice/drug master storing character strings representing medical practices or medicines is searched for hits for a character string matching the first keyword. a first search means for outputting information on medical practices or pharmaceuticals corresponding to the hit character string when a character string matching the first keyword is hit as a result of the search;
If the search by the first search means does not find a hit for a character string that matches the first keyword, the prefix master that stores a prefix that is a specific character string attached to the beginning of the character string is referred to. a first character string generating means for generating a character string by removing the prefix from the first keyword;
A second search means for searching whether a character string matching the second keyword is hit from the medical practice/pharmaceutical master using the character string generated by the first character string generation means as a second keyword, a second search means for outputting information on medical practices or pharmaceuticals corresponding to the hit character string when a character string matching the second keyword is hit as a result of the search;
If a search by the second search means does not hit a character string that matches the second keyword, a character string is generated by removing at least the parenthesis and the character string enclosed by the parentheses from the second keyword. a second character string generating means;
Similar character string extraction means for extracting a character string containing the third keyword from the medical practice/drug master, using the character string generated by the second character string generation means as the third keyword;
A computer program characterized by functioning as output means for outputting information on at least one medical practice or drug corresponding to the character string extracted by the similar character string extraction means.

the computer,
data acquisition means for acquiring text data;
Character string extraction means for extracting a group of character strings representing one item from the text data acquired by the data acquisition means;
Using the character string extracted by the character string extracting means as a first keyword, a first search means searches for a character string that matches the first keyword from an injury or disease name master storing character strings representing injury or disease names. a first search means for outputting, when a character string matching the first keyword is hit as a result of the search, information on an injury or disease name corresponding to the hit character string;
If the search by the first search means does not find a hit for a character string that matches the first keyword, the suffix master storing a suffix that is a specific character string attached to the end of the character string is referred to. a first character string generating means for generating a character string by removing the suffix from the first keyword;
A second search means for searching whether a character string matching the second keyword is hit from the disease name master using the character string generated by the first character string generation means as a second keyword, As a result, when a character string that matches the second keyword is hit, a second search means for outputting information on the disease name corresponding to the hit character string;
If the search by the second search means does not find a hit for a character string that matches the second keyword, the prefix master that stores a prefix that is a specific character string attached to the beginning of the character string is referred to. a second character string generating means for generating a character string with the prefix removed from the second keyword;
A third search means for searching for a hit of a character string matching the third keyword from the disease name master using the character string generated by the second character string generation means as a third keyword, A computer program, characterized in that, when a character string matching the third keyword is hit as a result, the computer program functions as a third search means for outputting information of an injury or disease name corresponding to the hit character string.