JP5682031B2

JP5682031B2 - Method and system for generating text

Info

Publication number: JP5682031B2
Application number: JP2010288365A
Authority: JP
Inventors: ピーターズリンゼイ; レイバースティモシー
Original assignee: パシフィックノレッジシステムズピーティーワイリミテッド
Priority date: 2010-10-12
Filing date: 2010-12-24
Publication date: 2015-03-11
Anticipated expiration: 2030-12-24
Also published as: JP2012084109A

Description

本発明は、全体として、テキストを生成する方法及びシステムに関し、特に、構文法的に正しいテキストをレポート用に生成する方法及びシステム（ただし、これに限定されるわけではない）に関する。 The present invention relates generally to methods and systems for generating text, and more particularly to, but not limited to, methods and systems for generating syntactically correct text for reports.

２０世紀半ば以来の、処理速度及びメモリ容量を含むコンピュータ能力の指数関数的な増大が、社会のあらゆる部門における、まさにわれわれの日常生活におけるコンピュータ利用の有用性を劇的に高めた。コンピュータの主要な用途の一つは、ますます量を増大させるデータの生成及び保存である。しかし、生のデータは、それ自体では、限られた価値しか持たない。ほとんどの場合、その真の価値は、生データが誰かによって解釈され、必要な理解と洞察がもたらされたときに、初めて獲得することができる。この解釈プロセスは、「データ」を「知識」に変換し、それをしばしば「判断」に変換する付加価値プロセスである。この知識又は判断は、テキストのレポートでしばしば表現される。 The exponential increase in computer capacity, including processing speed and memory capacity since the mid-20th century, has dramatically increased the usefulness of using computers in every part of society, right in our daily lives. One of the major uses of computers is the generation and storage of increasingly large amounts of data. However, raw data by itself has limited value. In most cases, the true value can only be acquired when the raw data is interpreted by someone and the necessary understanding and insight is provided. This interpretation process is a value-added process that converts “data” into “knowledge” and often converts it into “judgement”. This knowledge or judgment is often expressed in text reports.

コンピュータ処理されるプロセスは、数値データ及びテキストデータの双方を抽出し、対照し、保存するのに役立つが、このデータを人間又はコンピュータによって効果的に解釈する能力は、大きなデータ量とそれに伴う複雑性によって制限されることがある。 Computerized processes help to extract, contrast, and store both numerical and textual data, but the ability to effectively interpret this data by humans or computers is a large amount of data and associated complexity. May be limited by gender.

人間の場合、データの本体をタイムリーに正確に解釈するように判断を下す能力には、有意な特徴（ｓｉｇｎｉｆｉｃａｎｔｆｅａｔｕｒｅｓ）が明白となるようにデータが前処理され十分に縮小されていることが必要となる。 For humans, the ability to make decisions to accurately interpret the body of data in a timely manner is that the data has been preprocessed and sufficiently reduced so that significant features are evident. Necessary.

ルールベースのエキスパートシステムの場合、大きく複雑なデータセットの全ての特殊性を考慮するのに必要とされるルールの増殖を回避するために、各ルールはできるだけ汎用的にするという更なる関連要件が存在する。より汎用的なルールは、データセットからの、より上位の抽象概念（ａｂｓｔｒａｃｔｉｏｎｓ）を使用して構築されるので、基礎をなすデータの詳細のばらつきは、それらのルールを必ずしも無効にしない。これらのより上位の抽象概念は、まさに、ルールベースのエキスパートシステムを構築する人間の専門家が使用する有意な特徴である。 In the case of rule-based expert systems, there is a further related requirement that each rule be as general as possible to avoid the proliferation of rules required to take into account all the peculiarities of large and complex data sets. Exists. Since more general rules are built using higher level abstractions from the data set, variations in the details of the underlying data do not necessarily invalidate those rules. These higher level abstractions are just significant features used by human experts who build rule-based expert systems.

すなわち、人間の専門家とちょうど同じように、エキスパートシステムは、大きな一組の原データ値ではなく、より小さな一組の有意な特徴に基づいて推論を行うことができる形式に複雑なデータが縮小されることを必要とする。 That is, just like a human expert, an expert system reduces complex data to a form that allows inferences based on a smaller set of significant features rather than a larger set of raw data values. Need to be done.

したがって、後続の解釈のために人間又はコンピュータに提示できる、より小さなあまり複雑でない一組の有意な値にデータを前処理することによって、解釈すべきデータのデータ複雑性を低減する方法を見つけることが課題である。 Therefore, finding a way to reduce the data complexity of the data to be interpreted by preprocessing the data into a smaller, less complex set of significant values that can be presented to a human or computer for subsequent interpretation. Is an issue.

データ複雑性に寄与する主要な要因は二つ存在する。 There are two main factors that contribute to data complexity.

第１の要因は、解釈する必要があるかもしれないデータ項目値の数そのものであり、すなわち、与えられたシステム内に、分析する必要がある多数の要素が存在する場合である。 The first factor is the exact number of data item values that may need to be interpreted, i.e., there are a large number of elements that need to be analyzed in a given system.

例えば、委託医師向けの患者検査レポートを作成するために、研究所の病理学者は、患者の血液サンプルを分析した診断機器で使用された数百ものタンパク質バイオマーカーの結果を解釈しなければならないことがある。 For example, to create a patient test report for a referring physician, a laboratory pathologist must interpret the results of hundreds of protein biomarkers used in a diagnostic instrument that analyzed a patient's blood sample. There is.

複雑性に拍車をかける第２の要因は、個々のデータ値それ自体のサイズと、おそらくは構造化されていない（「自由形式の」）フォーマットである。単一の数値又は列挙値（すなわちテキストコード）は、それ自体では、例えば３．４ｍｍｏｌ／Ｌのトロポニン値など、この「原子的な」値と、対応するデータ項目とに明らかな関連性が存在するので、解釈が比較的簡単かもしれない。 The second factor that spurs complexity is the size of the individual data values themselves and possibly the unstructured ("free form") format. A single numeric value or enumerated value (ie text code) by itself has a clear relationship between this “atomic” value and the corresponding data item, eg a troponin value of 3.4 mmol / L. So it may be relatively easy to interpret.

しかし、大きな自由形式のテキストは、曖昧表現、誤字、略語、二つ以上のデータ値、又は同じデータ値の多くの異なる可能な解釈のうちの一つを含むことがあり、これが解釈をはるかに困難にする。 However, large free-form text may contain ambiguous expressions, typographical errors, abbreviations, two or more data values, or one of many different possible interpretations of the same data value, which makes the interpretation much more Make it difficult.

例えば、委託医師向けの患者検査レポートを作成するために、研究所の病理学者は、委託医師によって提供された患者の長大なテキストの病歴に照らして、機械が生成した検査結果を解釈しなければならないことがある。病歴は、大きな構造化されていないデータ項目なので複雑であり、テキスト内の比較的小さなばらつきが、得られる解釈を完全に変えてしまうことがある。例えば、省略形の語句「ＤＭ」（既知の真性糖尿病（ｄｉａｂｅｔｅｓｍｅｌｌｉｔｕｓ））、「ＦＨＤＭ」（真性糖尿病の家族歴）、「？ＤＭ」（糖尿病の疑い）、「ｎｏｔＤＭ」（糖尿病ではない）はどれも、与えられた一組のグルコース検査結果についての病理学者の解釈を変化させる。臨床メモ内の同義語（「ＤＭ」、「Ｄｉａｂｅｔｉｃ」（糖尿病の）、「Ｄｉａｂ」（糖尿）、「Ｄｉａｂｅｔｅｓｎｏｔｅｄ」（糖尿病注意））、誤記（「ＤｉａｂｅｔｅｓＭｅｌｌｉｔｉｓ」）、語順の変化（「？ＤＭ」、「ＤＭ？」）はどれも、解釈を行うときに、病理学者によって理解される必要があることにも留意されたい。 For example, to create a patient test report for a contracted physician, the laboratory pathologist must interpret the machine-generated test results in the context of the patient's long textual history provided by the contracted physician. It may not be. The medical history is complex because it is a large unstructured data item, and relatively small variations in the text can completely alter the resulting interpretation. For example, the abbreviations “DM” (known diabetes mellitus), “FH DM” (family history of diabetes mellitus), “? DM” (suspected diabetes), “not DM” (not diabetes) ) All change the pathologist's interpretation of a given set of glucose test results. Synonyms in clinical memos ("DM", "Diabetic" (diabetic), "Diab" (diabetes), "Diabetes noted")), typographical errors ("Diabetes Melitis"), word order changes ("? Note also that any “DM”, “DM?”) Must be understood by the pathologist when interpreting.

病歴は、語句「ｏｎＺｏｃｏｒ」（Ｚｏｃｏｒ投与中）又は「ｏｎｌｉｐｉｄｌｏｗｅｒｔｒｅａｔｍｅｎｔ」（脂質低下薬投与中）も含むことがあり、どちらの語句（フレーズ）も、患者が何らかの心臓薬を服用しているかどうかを病理学者に伝える第２の概念を表す。この種の語句も同様に、検査結果についての病理学者の解釈と、委託医師向けに作られるレポートに影響する。 The medical history may also include the phrases “on Zocor” (during Zocor administration) or “on lipid lower treatment” (during administration of lipid-lowering drug), both phrases (phrases) taken by the patient taking some heart medication Represents a second concept that tells the pathologist whether or not This kind of phrase also affects the pathologist's interpretation of the test results and the reports produced for the referring physician.

具体例「ＤＭ，ｏｎＺｏｃｏｒ」を見ると、「病歴」データ項目と原子的な値の間に明らかな関連性は存在しない。むしろ、複雑なデータ項目としての病歴は、二つのより単純で原子的なデータ項目、例えば「糖尿病（はい）」と「治療薬投与（はい）」を暗示的に含む。 Looking at the specific example “DM, on Zocor”, there is no clear association between the “medical history” data item and the atomic value. Rather, the history as a complex data item implicitly includes two simpler atomic data items, such as “diabetes (yes)” and “therapeutic drug administration (yes)”.

データ項目値のサイズと構造の欠如とに起因するこの第２のタイプの複雑性の別の例は、第１の所究所が、患者検査の幾つかを「内部的に」実行する一方で、幾つかのより専門的な検査のために血液サンプルを第２の研究所に送付する場合である。第２の研究所は、テキストレポートで検査結果を返送する。第１の研究所の病理学者の観点からは、第２の研究所から受け取ったレポートは、複雑なデータ項目である。病理学者は、委託医師向けの最終レポートを作成するために、このレポートに加えて、第１の研究所で行われた結果も解釈しなければならない。 Another example of this second type of complexity due to the size of data item values and the lack of structure is that while the first laboratory performs "internally" some of the patient tests , When sending blood samples to a second laboratory for some more specialized tests. The second laboratory returns test results in a text report. From the perspective of the first laboratory pathologist, the report received from the second laboratory is a complex data item. In addition to this report, the pathologist must also interpret the results of the first laboratory in order to produce a final report for the referring physician.

複雑なデータを伴う臨床分野の別の例は、アレルギー分野である。この分野では、関連する一つ以上のアレルゲンを特定するために、可能性のあるアレルゲンを血液サンプルにおいて検査し、その後、長大かもしれない自由形式の患者病歴における症状と照合する必要がある。他の例としては、伝染病（病原菌の特定）や多臓器疾患（例えば、神経学的、内分泌学的、腫瘍学的な遠因の特定）が挙げられる。 Another example of a clinical field with complex data is the allergy field. In this area, to identify one or more relevant allergens, potential allergens need to be tested in blood samples and then matched to symptoms in a free-form patient history that may be lengthy. Other examples include contagious diseases (identification of pathogenic bacteria) and multi-organ diseases (eg, identification of neurological, endocrinological, and oncological distant causes).

複雑なデータの解釈における同様の困難は、（例えば、航空券、運転免許証、及びパスポートの再発行、クレジットカードでの購入、並びに電子商取引における）不正検出、物流における監査、在庫管理、（例えば、偽造の検出における、若しくは製品リコール目的のための）シリアルナンバー管理、又はＩＴサポートサービスなどの非医療分野でも生じる。 Similar difficulties in interpreting complex data include fraud detection (eg, reissue of tickets, driver's licenses and passports, credit card purchases, and e-commerce), logistics audits, inventory management (eg, It also occurs in non-medical areas such as serial number management (for counterfeit detection or for product recall purposes) or IT support services.

航空会社不正検出の例では、チケットの販売及び乗客のフライトに関する構造化されていない又は半構造化されたデータを含む多数のイベントが記録され、その後、指定された航空券に対して正しい価格が適用されたかどうかを識別するために、価格決定運賃表及び航空券再発行のための他の基準と照合される必要がある。これは面倒な業務である。というのも、運賃表及び航空券に含まれる情報は、構造化されていないか、又は半構造化されているにすぎず、また、各集合は、運賃表に表された条件に実際に従っているかどうかを判断するために、人間の専門家によって個別に解釈されるからである。 In the case of airline fraud detection, a number of events are recorded, including unstructured or semi-structured data regarding ticket sales and passenger flights, and then the correct price for the specified ticket is recorded. To identify whether it has been applied, it needs to be matched against pricing fare tables and other criteria for ticket reissue. This is a cumbersome task. This is because the information contained in the fare table and ticket is unstructured or only semi-structured, and is each set actually in accordance with the conditions expressed in the fare table? This is because it is individually interpreted by human experts to determine whether or not.

人間の専門家による効率的で正確な解釈を可能にするために、（この例では）運賃表の複雑なデータは、特定のチケットに適用可能な一組の条件に縮小される必要がある。そのチケットの関連する特性（出発都市及び行先都市、搭乗日、搭乗クラス、価格）も、抽出される必要がある。運賃表及びチケットのデータが前処理されて、これらの有意な特徴になれば、不正な又は間違った発券イベントが存在するかどうかに関して、人間の専門家が判断を下すことができる。 In order to allow efficient and accurate interpretation by human experts, the complex data in the fare table (in this example) needs to be reduced to a set of conditions applicable to a particular ticket. The relevant characteristics (departure city and destination city, boarding date, boarding class, price) of the ticket also need to be extracted. Once the fare table and ticket data are pre-processed to these significant features, a human expert can make a decision as to whether there is a fraudulent or incorrect ticketing event.

不動産査定の仕事は、複雑なデータの解釈が必要とされる別の領域である。この分野では、必要とされる解釈は、金額とその根拠となる説明から構成される査定である。解釈がなされるデータは、家屋及び土地の広さ、家屋の向き、付近の郵便番号及び最近の査定、又は他の比較し得る不動産を含む、様々な複雑で異種のデータから成る。不動産の様々な特性（例えば、隣接する高層アパートによって遮られる眺望）を記述した自由形式のテキストメモは、査定に影響する重要な要因を含むことがあり、そのため、解釈を必要とする。 Real estate assessment work is another area where complex data interpretation is required. In this area, the interpretation required is an assessment consisting of the amount and the underlying explanation. Interpreted data consists of a variety of complex and disparate data, including house and land size, house orientation, nearby zip code and recent assessment, or other comparable real estate. Free-form text memos that describe various properties of a property (eg, a view blocked by an adjacent high-rise apartment) may contain important factors that affect the assessment and therefore require interpretation.

複雑なデータの解釈を必要とする非医療分野の別の例は、ＩＴサポートサービスの分野である。企業が、ニュースフィード又は他のレポートなど、定期的な付加価値出力を契約顧客に提供するオンライントランザクション処理システムを考えてもらいたい。 Another example of a non-medical field that requires complex data interpretation is the field of IT support services. Imagine an online transaction processing system where companies provide regular value-added output to contract customers, such as news feeds or other reports.

企業のオンライントランザクション処理システムの信頼性は、このサービスの性能に不可欠である。非常に高い水準の信頼性を達成するには、システムの信頼性に影響し得る全ての要因について、システムを継続的に監視し続けなければならない。 The reliability of a company's online transaction processing system is essential to the performance of this service. To achieve a very high level of reliability, the system must be continuously monitored for all factors that can affect the reliability of the system.

これらの要因は、トランザクションレート、ユーザアクティビティ、メモリ、ディスク、及びＣＰＵなどのリソース利用に加えて、オペレーティングシステムが発生させる警報及び警告、並びにトランザクション処理アプリケーション自体が発生させる警報及び警告を含む。これらの要因を記録する標準的な方法は、ログファイルなどの中央設備に、こうした情報の全てを継続的にログに取ることであり、中央設備では、そのような情報を企業のＩＴサポートスタッフによって定期的に分析することができる。目標は、オンライントランザクションシステムが障害を起こす前に、ログファイルに記録されたあらゆる深刻な警報又は気がかりな傾向にＩＴサポートスタッフが対応することである。 These factors include alarms and warnings generated by the operating system and alarms and warnings generated by the transaction processing application itself, in addition to resource usage such as transaction rate, user activity, memory, disk, and CPU. The standard way to record these factors is to continuously log all of this information to a central facility, such as a log file, where the information is collected by the company's IT support staff. It can be analyzed regularly. The goal is for IT support staff to respond to any serious alerts or anxious trends recorded in the log file before the online transaction system fails.

ログエントリは、しばしば異なる製造元製品に属する様々なオペレーティングシステム又はアプリケーションシステムのコンポーネントによって生成されるので、ユニバーサルコーディングシステムによってフォーマットされておらず、基本的に自由テキストである。大規模なオンライントランザクション処理システムの場合、ログファイルは、例えば１日当たり数十メガバイトなど、非常に大きくなることがあり、ＩＴサポートスタッフが人的に検査できる範囲を超えている。更に、あるクラスの警報は、即座の行動を必要とすることがあり、その場合、警報及び対応する修復行動の決定が迅速に確認される必要があるかもしれない。 Since log entries are often generated by various operating system or application system components belonging to different manufacturer products, they are not formatted by the universal coding system and are essentially free text. In the case of a large online transaction processing system, the log file can be very large, for example several tens of megabytes per day, which is beyond the range that IT support staff can inspect manually. In addition, certain classes of alerts may require immediate action, in which case the alert and the corresponding repair action determination may need to be quickly confirmed.

先の例のように、人間の専門家による効率的で正確な解釈を可能にするには、ログファイル内の複雑なデータは、何らかの修復行動を取る必要があるかどうかに関して人間の専門家が判断を下す際の基礎となる一組の有意な特徴（例えば、警報や傾向の状態条件）へと前処理される必要がある。 To allow efficient and accurate interpretation by human experts, as in the previous example, the complex data in the log file must be It needs to be preprocessed to a set of significant features (eg, alarms and trend conditions) that are the basis for making decisions.

コンピュータベースのエキスパートシステムは、人間の解釈プロセスを模倣しようと試みている。例えば、リップルダウン（ＲｉｐｐｌｅＤｏｗｎ）は、米国特許第６５５３３６１号で説明されているように、高度に特有な解釈をどのようにケースバイケースに行うかを分野の専門家によって教え込まれたコンピュータベースのエキスパートシステム（決定エンジン）である。 Computer-based expert systems attempt to mimic human interpretation processes. For example, RippleDown is a computer-based instructor taught by field experts how to do a highly specific interpretation on a case-by-case basis, as described in US Pat. No. 6,553,361. It is an expert system (decision engine).

人間の専門家と同様に、ルールベースのエキスパートシステムは、関連する有意な特徴からシステムが推論を行えるように、それらの特徴に関してデータがシステムに提示されるようにする必要がある。複雑な生データ（例えば、運賃表及びチケット自体におけるデータ）から推論を行うとすれば、必要とされる具体的なルールの数が扱いきれないほどになるばかりでなく、ひとたびルールを構築してしまうと、運賃表やチケットにおいて新たに遭遇するどのようなバリエーションも解釈に失敗することになる。 Like human experts, rule-based expert systems need to ensure that data is presented to the system with respect to those features so that the system can infer from the relevant significant features. If you infer from complex raw data (for example, data in the fare table and the ticket itself), not only can you handle the number of specific rules that are needed, but once you build the rules, In any case, any new variation encountered in the fare schedule or ticket will fail to interpret.

高トランザクション環境では、エキスパートシステムは、人間の専門技術を高めて生データの迅速な解釈をもたらすうえで重要な役割を果たすことができる。例えば、病理学研究所は、その研究所で雇用できる僅かな人数の病理学者の人的能力をはるかに超えた、１日当たり数万人の患者についての解釈レポートを提供する必要がある可能性がある。 In high transaction environments, expert systems can play an important role in enhancing human expertise and providing rapid interpretation of raw data. For example, a pathology laboratory may need to provide interpretation reports for tens of thousands of patients per day, far exceeding the human capacity of a small number of pathologists that can be employed at the laboratory. is there.

しかし、データを解釈するエキスパートシステムの能力は、人間の専門家を制限するのと同じ要因、すなわち、データ複雑性によって制限される。複雑なデータは、人間の専門家が使用するよりも上位の概念を使用してルールを構築できるような形式に前処理して、そうでなければ生じるであろうルール及びレポート定義の増殖を回避する必要がある。 However, the ability of expert systems to interpret data is limited by the same factors that limit human professionals, namely data complexity. Complex data is preprocessed in a format that allows rules to be built using higher concepts than those used by human experts, avoiding the proliferation of rules and report definitions that would otherwise occur There is a need to.

以下では、データ複雑性問題のより詳細で具体的な二つの例を提示する。 In the following, two more detailed and specific examples of data complexity issues are presented.

第１のより具体的な例は、内科病理学の分野におけるものである。この分野では、病理医などの専門家によって通常は実行される複雑な調査が、しばしば多数の検査を必要とする。検査結果の解釈は、しばしば困難であり、専門家又はエキスパートシステムの技量を必要とする。専門家又はエキスパートシステムは、検査結果の有益な分析及び解釈を時には高度に要約された形式で含むレポート内に含まれるテキストを生成し、そのレポートは、生の検査結果自体を解釈するための専門知識をもたない可能性のある委託医師（例えば家庭医）に送られる。これまでは、検査が比較的相互に独立している分野において、エキスパートシステムの知識ベース（ナレッジベース）が構築されてきた。例えば、甲状腺レポート用の知識ベースは、甲状腺機能検査の結果（すなわち、ＴＳＨ、ＦＴ３、及びＦＴ４）を主として考察する。身体検査又は口述された病歴から臨床メモに記録された所見と同様に、年齢及び性別などの他の患者人口統計データも、一般に考慮される必要がある。これらの知識ベースを使用して生成されたレポートは、これらの個々の検査及びそれらの値に言及するのに加えて、診断も提供し、しばしば治療及び追跡検査の推奨も与える。一般に、これらの分野では、検討すべき２０未満の検査に加えて、年齢及び性別などの患者人口統計データと、開業医によって提供される臨床メモ内の所見が存在する。検査結果は相互作用することがあるため、ある程度、関連をもつことがあるが（例えば、一つの検査が異常である場合、別の検査も異常である可能性が高い）、考慮すべき検査及び検査相互作用の数が少なければ、知識ベース内のルールは、個々の検査結果そのものに言及でき、依然としてその一般性を維持できる。すなわち、検査結果が、解釈の前に何らかの前処理工程によって、より小さな一組の有意な特徴に縮小される必要はない。 The first more specific example is in the field of medical pathology. In this field, complex investigations usually performed by specialists such as pathologists often require a large number of tests. Interpretation of test results is often difficult and requires the skill of an expert or expert system. An expert or expert system generates text that is contained within a report that contains useful analysis and interpretation of the test results, sometimes in a highly summarized form, and the report is specialized for interpreting the raw test results themselves. Sent to a referring physician (eg family doctor) who may not have knowledge. Until now, knowledge bases (knowledge bases) of expert systems have been established in the field where examinations are relatively independent of each other. For example, a knowledge base for thyroid reports primarily considers thyroid function test results (ie, TSH, FT3, and FT4). Other patient demographic data, such as age and gender, generally need to be considered, as well as findings recorded in clinical memos from physical examination or dictated medical history. In addition to referring to these individual tests and their values, reports generated using these knowledge bases also provide diagnosis and often provide treatment and follow-up test recommendations. In general, in these areas, there are less than 20 tests to consider, as well as patient demographic data such as age and gender, and findings in clinical notes provided by the practitioner. Because test results may interact, they may be related to some extent (eg, if one test is abnormal, another test is likely to be abnormal), and the tests to consider If the number of test interactions is small, the rules in the knowledge base can refer to the individual test results themselves and still maintain their generality. That is, the inspection result need not be reduced to a smaller set of significant features by any pre-processing step prior to interpretation.

ある条件下で与えられたテキストコメントから構成される具体的なルールは、各個別検査結果を考慮することによって、又は検査結果の比較的少数の有意な組合せを考慮することによって、書くことができる。例えば、一群の甲状腺検査の場合、ＴＳＨ検査結果が上昇したのであれば、「原発性甲状腺機能低下症と一致する」というコメントが生成されることがある。 Specific rules composed of text comments given under certain conditions can be written by considering each individual test result or by considering a relatively small number of significant combinations of test results . For example, in the case of a group of thyroid tests, if the TSH test results have increased, a comment “consistent with primary hypothyroidism” may be generated.

上記の甲状腺の例など従来の臨床分野は、少数の属性（Ａｔｔｒｉｂｕｔｅｓ）しか有さない。しかし、数百又は数千にすらなる可能性のある調査が見込まれる、より新しい臨床分野の場合、各種類の調査へ具体的なルールを適用することは不可能になる。例えば、開業医は、ピーナッツ、大豆、ミルク、小麦、卵など、多くの食品アレルギー検査を要求することがある。大豆及びミルクが非常に高い陽性値（例えば、それぞれ２４．３及び３０．１）を返し、他の検査が陰性である場合、病理学者は、医師に返送するレポートに、「非常に高い結果がミルク（３０．１）及び大豆（２４．３）に対して検出された」
のようなコメントを含めることを望む。 Traditional clinical fields, such as the thyroid example above, have only a small number of Attributes. However, for newer clinical areas where studies that could be hundreds or even thousands are expected, it will not be possible to apply specific rules to each type of study. For example, a practitioner may require many food allergy tests, such as peanuts, soy, milk, wheat, and eggs. If soy and milk return very high positive values (eg, 24.3 and 30.1, respectively) and other tests are negative, the pathologist returns a “very high result” Detected against milk (30.1) and soy (24.3) "
Would like to include comments like

検査データの解釈がこのコメントを与えることを可能にするルールは、
１０≦ミルク≦５０、非常に高い結果を指示する
１０≦大豆≦５０、別の非常に高い結果を指示する
ミルク＞大豆、レポート内でミルク値が大豆よりも前となるべきことを指示する
ピーナッツ＝０
小麦＝０、卵＝０
に基づいている。 The rules that allow interpretation of test data to give this comment are:
10 ≦ Milk ≦ 50, indicating very high result 10 ≦ Soy ≦ 50, indicating another very high result Milk> Soy, indicating that the milk value should be before soybean in the report Peanuts = 0
Wheat = 0, egg = 0
Based on.

たかだか五つのアレルゲンが検査されるこの単純な例では、上記のコメントの組合せの数は２^５＝３２（重要度の順序は無視）である。検査結果の各組合せに対応して、異なるルールが存在する必要がある。 In this simple example where at most five allergens are tested, the number of combinations of the above comments is 2 ⁵ = 32 (ignoring the order of importance). Different rules need to exist for each combination of inspection results.

この単純なコメントであっても、このコメントと対応するルールとの３２個の可能な組合せの各々を個別に定義することは明らかに現実的ではない。そして、現実世界の例は、これよりもはるかに複雑である。 Even with this simple comment, it is clearly not practical to individually define each of the 32 possible combinations of this comment and the corresponding rule. And the real world example is much more complex than this.

アレルギー知識ベースの場合、調査において実行できる文字通り数百の可能な検査が存在し、各々は、同じ化学物質（ＩｇＥ）を測定し、各検査の値は、特定のアレルゲンに対する患者の反応を示す。調査において数百の検査が存在する場合、専門家が、検査結果間に生じうる相互作用を定義して、正確なレポートが必要とする多数のコメントバリエーションを提供することは不可能である。解釈的知識ベースが定義できるようになる前に、この分野のデータ複雑性を大幅に低減しなければならない。 In the case of an allergy knowledge base, there are literally hundreds of possible tests that can be performed in the study, each measuring the same chemical (IgE), and the value of each test indicates the patient's response to a particular allergen. When there are hundreds of tests in a survey, it is impossible for an expert to define the interactions that can occur between test results and provide the large number of comment variations that an accurate report requires. Before an interpretive knowledge base can be defined, the data complexity in this area must be significantly reduced.

しかし、高度に複雑なデータを考慮したレポートを生成するという計算上の課題は、従来のエキスパートシステムの能力を超えている。例えば、４００の検査が存在すれば、各検査が、「陽性」又は「陰性」など、わずか２種類の出力しかもたない場合でも、検査結果の可能な組合せは２^４００も存在し、各組合せは、あらかじめ生成されてコンピュータシステム上に保存される固有のレポート用テキスト結論を必要とする。ここでは、検査データ間に生じうる相互作用や、状況を大きく複雑化する臨床メモなどの他の関連する入力を考慮すらしていない。複雑なデータを解釈しようと試みる従来の手法は、数百又はそれ以上の所見が存在する場合は、実現可能ではない。 However, the computational challenge of generating reports that take into account highly complex data is beyond the capabilities of conventional expert systems. For example, if there are 400 tests, there are 2 ⁴⁰⁰ possible combinations of test results, even if each test has only two types of output, such as “positive” or “negative”, and each combination is Need unique report text conclusions that are pre-generated and stored on the computer system. It does not even take into account other possible inputs such as clinical memos, which can occur between test data and the situation is greatly complicated. Conventional approaches that attempt to interpret complex data are not feasible if there are hundreds of observations or more.

臨床環境では、様々な症例及び対応するレポートは、検査の数があまり多くなくても膨大になることがあり、患者の病歴情報及び臨床メモも考慮される場合は、なおさらそうである。 In a clinical environment, the various cases and corresponding reports can be enormous even if the number of tests is not so large, especially if patient history information and clinical notes are also considered.

第２のより具体的な例は、航空会社によって直接的に、又は旅行代理店、航空券安売業者、若しくはオンライントラベルウェブサイトを介して間接的に航空券が発行される可能性のある航空券発行アプリケーションである。（例えば、旅行プラン変更のため、又は紛失チケット若しくは破損チケットの交換のため）チケットを再発行する必要がある場合、運賃表（航空券に影響する期間及び条件についての文書）に照らして、並びに元の取引明細（例えば、支払われた金額、購入チケットの枚数、取引通貨、乗客の名前、購入日付及び場所）に照らして、元の取引の詳細を確認する必要がある。とりわけ困難なのは、航空運賃表が複雑なテキストデータ項目であることである。航空運賃表は、何らかの明確なフォーマットに従っていないが、それにも関わらず、特定の重要な情報を含んでいる。この情報は、「キャンセル」、「旅行前」、「紛失チケット」などといった多数のキーターム（ＫｅｙＴｅｒｍ、キーとなる用語）として、更に、金銭値及び日付として、しばしば表現される。一つの運賃表内で、また複数の運賃表間で、各キータームは、様々な形で出現することがある。例えば、「ｆｒｅｅｏｆｃｈａｒｇｅ」（無料）、「ｆｏｃ」、及び「ｎｏｐｅｎａｌｔｙ」（違約金なし）は全て、同じことを意味する。 A second more specific example is an airline where the ticket may be issued directly by the airline or indirectly through a travel agent, ticket dealer, or online travel website. It is a ticket issuing application. In case of needing to reissue a ticket (for example, due to a travel plan change or exchange of a lost or damaged ticket) in light of the fare table (documents of the period and conditions affecting the ticket) and Details of the original transaction need to be verified against the original transaction details (eg, amount paid, number of tickets purchased, transaction currency, passenger name, purchase date and location). What is particularly difficult is that the air fare table is a complex text data item. The air fare table does not follow any clear format, but nevertheless contains certain important information. This information is often expressed as a number of key terms (key terms) such as “cancel”, “before travel”, “lost ticket”, etc., and also as monetary values and dates. Within a single fare schedule and between multiple fare schedules, each key term may appear in various forms. For example, “free of charge” (free), “foc”, and “no penalty” (all without penalty) all mean the same thing.

運賃表の各々は、キータームを含むばかりでなく、搭乗前キャンセルのための違約金、紛失チケットのための違約金などといった特定の情報も指定する。これらのキーコンセプト（ＫｅｙＣｏｎｃｅｐｔ、キーとなる概念）の各々は、キータームを使用して、様々な異なる方法で表現される。 Each fare table not only includes key terms, but also specifies specific information such as penalty for cancellation before boarding, penalty for lost tickets, and so on. Each of these key concepts (Key Concepts) is expressed in a variety of different ways using key terms.

したがって、上記の例では、様々な方法で表現された関連する情報を含む自由テキストのブロックを解析し、その後、自由テキストからの情報を他のデータとともに解析して、結論に達することが必要である。類似の問題は、医療診断（臨床メモが自由テキストで表現された重要な情報を含むことがあり、病理学検査及び人口統計データと併せて解釈されなければならない）の場面でも生じる。 Therefore, in the above example, it is necessary to analyze a block of free text containing relevant information expressed in various ways, and then analyze the information from the free text along with other data to reach a conclusion. is there. Similar problems arise in the context of medical diagnosis (clinical notes may contain important information expressed in free text and must be interpreted in conjunction with pathological examinations and demographic data).

自由テキストのブロックを解釈する際の困難には、以下が含まれる。
（ａ）一つ以上の有意な特徴を使用してルールを構築できるように、自由テキストのブロックからこれらの有意な特徴を抽出する際の困難。
（ｂ）知識ベースが、自由テキストブロックの少数の変形例を扱う困難。自由テキストブロック内のテキストデータが、ルールが構築されたテキストと全く同じではない場合、それらのルールは、新しい自由テキストブロックにも依然として適用されるほど十分には汎用的でないことがある。
（ｃ）知識ベースが、一つの自由テキストブロック内で、又は複数の自由テキストブロック間で、有意な特徴自体の異なる表現を扱う困難。
（ｄ）複数のキータームを含み、場合によっては幾つかのより上位のキーコンセプトを包含した自由テキストブロックに基づいてルールを構築する必要性。「キーコンセプト」は、解釈を行うときに専門家又はエキスパートシステムによって使用される、自由テキスト内に埋め込まれた有意な特徴である。キーコンセプトは、一連のキーターム（一つのキータームシーケンス）を表す、より上位の固有コードである。キータームシーケンスの幾つかの変形例が、単一のキーコンセプトにマッピングされる可能性がある。 Difficulties in interpreting blocks of free text include:
(A) Difficulties in extracting these significant features from a block of free text so that rules can be constructed using one or more significant features.
(B) The knowledge base is difficult to handle a small number of variations of free text blocks. If the text data in a free text block is not exactly the same as the text for which the rules were built, those rules may not be generic enough to still apply to new free text blocks.
(C) The knowledge base has difficulty handling different representations of significant features themselves within a single free text block or between multiple free text blocks.
(D) The need to build rules based on a free text block that contains multiple key terms and possibly some higher-level key concepts. A “key concept” is a significant feature embedded within free text that is used by an expert or expert system when interpreting. The key concept is a higher-level unique code that represents a series of key terms (a single key term sequence). Several variations of key term sequences may be mapped to a single key concept.

要約すると、データを解釈する際に人間の解釈プロセスを模倣するために使用される従来のコンピュータ応用エキスパートシステムは、複雑なデータを解釈するために使用される場合には、以下を含む多くの制約を受ける。
（ａ）結論に達するため又は判断（例えば最終的な診断）を表現するために非常に多数のデータ値を考慮する必要がある場合、解釈プロセスを操るルールが過度に複雑で扱いにくくなるので、非常に大量のデータ値を解釈するのは困難である。
（ｂ）大きな構造化されていないデータ項目値を扱うのは困難であり、その結果、そのような複雑なデータの解釈ができない。複雑なデータ項目を、より単純で原子的なデータ項目及び値を抽出でき、それらをルール及び結論において使用できるようにする正規の形式に縮小することは、扱いにくいプロセスであり、知識ベースを維持するうえで長期にわたる苦労をもたらす。 In summary, traditional computer application expert systems used to mimic the human interpretation process when interpreting data, when used to interpret complex data, have many constraints including: Receive.
(A) If a very large number of data values need to be considered to reach a conclusion or express a judgment (eg final diagnosis), the rules that manipulate the interpretation process become overly complex and unwieldy, It is difficult to interpret very large data values.
(B) It is difficult to handle large unstructured data item values, and as a result, such complex data cannot be interpreted. Shrinking complex data items into canonical forms that allow simpler, atomic data items and values to be extracted and used in rules and conclusions is a cumbersome process and maintains a knowledge base This will cause long-term difficulties.

したがって、従来のエキスパートシステムは、ますます大量化する複雑なデータを解釈し、そのようなデータを知識又は判断（テキストのレポートで表現される知識又は判断）に変換する際に制約を受ける。異なるソースから取得される数値データ及びテキストデータを含み、自由形式テキスト又はその代わりに「概況（ｓｙｎｏｐｔｉｃ）」レポートでのような構造化されたテキストを含む様々な形式で提示される大量の複雑なデータを解釈することが可能な、テキスト（テキストレポートなど）を生成するコンピュータ応用方法及びシステムが要望されている。 Thus, conventional expert systems are constrained in interpreting increasingly large amounts of complex data and converting such data into knowledge or judgment (knowledge or judgment expressed in text reports). A large number of complex, including numeric and textual data obtained from different sources and presented in a variety of formats, including free-form text or alternatively structured text, such as in a “synoptic” report There is a need for a computer application method and system for generating text (such as text reports) capable of interpreting data.

本発明の目的は、複雑なデータを解釈して、そのようなデータをテキストレポートで表現される知識又は判断に変換する際における従来のエキスパートシステムの上述した制約を解消する方法及びシステムを提供することである。 It is an object of the present invention to provide a method and system that resolves the above-mentioned limitations of conventional expert systems in interpreting complex data and converting such data into knowledge or judgment expressed in text reports. That is.

本発明の第１の態様によれば、複数のデータ項目から情報を生成する方法が提供される。この方法は、
（ａ）複数のデータ項目の少なくとも一つを用いて集約データ項目（ａｇｇｒｅｇａｔｅｄａｔａｉｔｅｍ）をポピュレート（ｐｏｐｕｌａｔｅ）する工程と、
（ｂ）集約データ項目を使用して情報を生成する工程と
を含み、
集約データ項目は、派生属性（ｄｅｒｉｖｅｄａｔｔｒｉｂｕｔｅ）の一形式であり、
一つの派生属性は、前記複数のデータ項目から一つ以上の上位概念が抽出され、それにより、情報を生成する際に、凝縮された量のより関連するデータを検討できるように、式を使用して前記複数のデータ項目から構築されるデータ項目であり、
情報を生成する当該方法が、決定支援システムによって実行され、
生成された情報が、以下のグループ
ｉ．テキスト情報、
ｉｉ．機械命令
のうちの一つ以上に属する。 According to a first aspect of the present invention, a method for generating information from a plurality of data items is provided. This method
(A) populating an aggregate data item using at least one of a plurality of data items;
(B) generating information using aggregated data items, and
An aggregate data item is a form of a derived attribute (derived attribute),
A derived attribute uses an expression so that one or more superordinate concepts can be extracted from the data items so that a condensed amount of more relevant data can be considered when generating the information. A data item constructed from the plurality of data items,
The method of generating information is performed by a decision support system,
The generated information is stored in the following groups i. Text information,
ii. Belongs to one or more of the machine instructions.

本発明の第２の態様によれば、複数のデータ項目から情報を生成するシステムが提供される。このシステムは、
（ａ）プリプロセッサであって、
ｉ．複数のデータ項目の少なくとも一つを用いて集約データ項目をポピュレートし、かつ
ｉｉ．複数のデータ項目から一つ以上の他の派生属性を構築する
プリプロセッサと、
（ｂ）前記集約データ項目を使用する派生属性及び他の派生属性を使用して情報を生成する情報生成器と
を備え、
情報生成器が、決定支援システムの少なくとも一部を成し、
前記集約データ項目が、派生属性の一形式であり、
一つの派生属性は、前記複数のデータ項目から一つ以上の上位概念が抽出され、それにより、情報を生成する際に、凝縮された量のより関連するデータを検討できるように、式を使用して前記複数のデータ項目から構築されるデータ項目であり、
生成された情報が、以下のグループ
ｉ．テキスト情報
ｉｉ．機械命令
のうちの一つ以上に属する。 According to the second aspect of the present invention, a system for generating information from a plurality of data items is provided. This system
(A) a preprocessor,
i. Populate an aggregate data item with at least one of the plurality of data items; and ii. A preprocessor that constructs one or more other derived attributes from multiple data items;
(B) an information generator that generates information using a derived attribute that uses the aggregated data item and other derived attributes;
An information generator forms at least part of a decision support system;
The aggregated data item is a form of a derived attribute;
A derived attribute uses an expression so that one or more superordinate concepts can be extracted from the data items so that a condensed amount of more relevant data can be considered when generating the information. A data item constructed from the plurality of data items,
The generated information is stored in the following groups i. Text information ii. Belongs to one or more of the machine instructions.

一実施形態では、この方法は、所定の構造の複数の要素の各々を、前記データ値のうちの対応する一つでポピュレートする前工程を含む。この構造は、前記複数の要素の各々を集約データ項目に関連付ける。この方法は、この構造に従って、前記複数のデータ項目の少なくとも一つを用いて集約データ項目をポピュレートしてもよい。 In one embodiment, the method includes a previous step of populating each of a plurality of elements of a predetermined structure with a corresponding one of the data values. This structure associates each of the plurality of elements with an aggregate data item. The method may populate aggregated data items using at least one of the plurality of data items according to this structure.

一実施形態では、この方法は、前記構造を処理して、集約データ項目の一つ以上の特性を決定する工程を含む。 In one embodiment, the method includes processing the structure to determine one or more characteristics of the aggregate data item.

一実施形態では、前記構造は、複数の集約データ項目を相互に関連させ、この方法は、前記構造を処理して、前記複数の集約データ項目の各々の一つ以上の特性を決定する工程を含む。 In one embodiment, the structure correlates a plurality of aggregate data items, and the method includes processing the structure to determine one or more characteristics of each of the plurality of aggregate data items. Including.

一実施形態では、前記情報はテキスト情報を含む。前記情報は、臨床決定支援情報であってもよい。この他に、前記情報は機械命令を含む。 In one embodiment, the information includes text information. The information may be clinical decision support information. In addition, the information includes machine instructions.

一実施形態では、情報を生成する工程は、知識ベース又は決定支援システムを使用して情報を生成する工程を含む。 In one embodiment, generating information includes generating information using a knowledge base or decision support system.

一実施形態では、集約データ項目をポピュレートする工程は、複数のデータ項目を受け取る工程を含む。一実施形態では、テキスト情報は、構文法的及び／又は文法的に正しいものであってもよい。テキスト情報は、人間が読むことのできるものであってもよい。一実施形態では、テキスト情報は、レポートの少なくとも一部を成していてもよい。レポートは、一つ以上の検査結果に関連付けられていてもよい。一実施形態では、各データ項目は、検査結果の一つに対応することができる。検査は、アレルギー検査、白血病検査、病理学検査、血液検査、及び任意の他の種類の医療検査のいずれを含んでいてもよい。データ項目は、任意の他の情報、例えば、性別、年齢、人口統計情報、臨床症状に対応していてもよい。 In one embodiment, populating the aggregated data item includes receiving a plurality of data items. In one embodiment, the text information may be syntactically and / or grammatically correct. The text information may be human readable. In one embodiment, the text information may form at least part of the report. The report may be associated with one or more test results. In one embodiment, each data item can correspond to one of the test results. The tests may include any of allergy tests, leukemia tests, pathology tests, blood tests, and any other type of medical test. The data item may correspond to any other information, such as gender, age, demographic information, clinical symptoms.

一実施形態では、集約データ項目は、互いに関連する複数のデータ項目を含む。一実施形態では、集約データ項目をポピュレートする工程は、これら複数のデータ項目の少なくとも一つ又は別の集約データ項目に一のルールを適用することによって、集約データ項目をポピュレートする工程を含む。このルールは、分野に特有のルールであってもよい。この他に、このルールは、ケースに特有のルールであってもよい。このルールは、前記複数のデータ項目の一つ以上を用いて集約データ項目をポピュレートするためのものであってもよい。ルールは、閾値を上回る一つ以上の前記データ項目を用いて集約データ項目をポピュレートするためのものであってもよい。ルールは、閾値を下回る一つ以上の前記データ項目を用いて集約データ項目をポピュレートするためのものであってもよい。 In one embodiment, the aggregate data item includes a plurality of data items associated with each other. In one embodiment, populating the aggregated data item includes populating the aggregated data item by applying a rule to at least one of the plurality of data items or another aggregated data item. This rule may be a rule specific to the field. In addition, this rule may be a rule specific to the case. This rule may be for populating aggregated data items using one or more of the plurality of data items. The rule may be for populating aggregate data items using one or more of the data items above a threshold. The rule may be for populating aggregated data items using one or more of the data items below a threshold.

一実施形態では、この方法は、（ａ）複数のデータ項目のうちの少なくとも一つを用いて一つ以上の集約データ項目をポピュレートする工程と、（ｂ）前記一つ以上の集約データ項目に一つ以上のルールを適用することによって、前記一つ以上の集約データ項目からのデータ項目を用いて一つ以上の更なる集約データ項目をポピュレートする工程と、（ｃ）一つ以上の更なる集約データ項目を使用して情報を生成する工程とを含む。 In one embodiment, the method includes (a) populating one or more aggregated data items using at least one of the plurality of data items; and (b) the one or more aggregated data items. Populating one or more further aggregated data items using data items from the one or more aggregated data items by applying one or more rules; and (c) one or more further Generating information using the aggregated data items.

一実施形態では、複数のデータ項目の各々は、識別子及び値に関連付けられる。複数のデータ項目の各々は、その識別子及び値を含んでいてもよい。この識別子は、データ項目の名称又はラベルに関連付けられてもよい。一実施形態では、情報を生成する工程は、集約データ項目をポピュレートするデータ項目の名称又はラベルを情報に含める工程を含む。情報を生成する工程は、集約データ項目をポピュレートするデータ項目に関連付けられた値を情報に含める工程を含んでいてもよい。情報を生成する工程は、情報内における名称又はラベルの順序を決定する工程を含んでいてもよい。 In one embodiment, each of the plurality of data items is associated with an identifier and a value. Each of the plurality of data items may include its identifier and value. This identifier may be associated with the name or label of the data item. In one embodiment, generating the information includes including in the information the name or label of the data item that populates the aggregated data item. The step of generating information may include the step of including in the information a value associated with the data item that populates the aggregated data item. The step of generating information may include the step of determining the order of names or labels in the information.

一実施形態では、情報を生成する工程は、集約データ項目の一つ以上の特性を決定する工程を含む。特性を決定する該工程は、集約データ項目を構成するデータ項目の数を決定する工程、集約データ項目が空であるかどうかを決定する工程、集約データ項目が特定のデータ項目を含むかどうかを決定する工程、集約データ項目が特定のデータ項目を含まないかどうかを決定する工程、及び複数の集約データ項目がデータ項目を共有するかどうかを決定する工程のうちの一つ以上を含んでいてもよい。特性の一つは、値であってもよい。 In one embodiment, generating the information includes determining one or more characteristics of the aggregate data item. The step of determining characteristics includes determining the number of data items that make up the aggregated data item, determining whether the aggregated data item is empty, and whether the aggregated data item includes a particular data item. Including one or more of determining, determining whether the aggregated data item does not include a particular data item, and determining whether multiple aggregated data items share the data item Also good. One of the characteristics may be a value.

一実施形態では、情報を生成する工程は、集約データ項目の決定された特性を情報に含める工程を含む。 In one embodiment, generating the information includes including the determined characteristics of the aggregate data item in the information.

一実施形態では、集約データ項目をポピュレートする工程は、一つ以上の他の集約データ項目を用いて集約データ項目をポピュレートする工程を含んでいてもよい。この集約データ項目と一つ以上の集約データ項目の各々とは、一つの集約識別子に関連付けられていてもよい。集約識別子は各々、集約名に関連付けられていてもよい。情報を生成する工程は、集約名を含める工程を含んでいてもよい。集約名を情報に含める工程は、情報内における集約名の順序を決定する工程を含んでいてもよい。 In one embodiment, populating the aggregate data item may include populating the aggregate data item with one or more other aggregate data items. The aggregate data item and each of the one or more aggregate data items may be associated with one aggregate identifier. Each aggregation identifier may be associated with an aggregation name. The step of generating information may include a step of including an aggregate name. The step of including the aggregate name in the information may include a step of determining the order of the aggregate name in the information.

一実施形態では、集約データ項目をポピュレートする工程は、二つの他の集約データ項目に演算を施す工程を含む。演算を施す工程は、二つの他の集約データ項目の差、和、及び積のうちの一つ以上を含んでいてもよい。一実施形態では、集約データ項目をポピュレートする工程は、別の集約データ項目を構成するどのデータ項目が、特定の範囲内の値又は特定の離散集合に属する値を有するかを決定する工程を含む。 In one embodiment, the step of populating the aggregated data item includes performing an operation on two other aggregated data items. The step of performing the operation may include one or more of a difference, a sum, and a product of two other aggregate data items. In one embodiment, populating the aggregated data item includes determining which data items that make up another aggregated data item have a value within a specific range or a value belonging to a specific discrete set. .

一実施形態では、情報を生成する工程は、集約データ項目に一つ以上のルールを適用する工程を含む。これら一つ以上のルールは、リップルダウンルール知識システムの少なくとも一部を成していてもよい。 In one embodiment, generating the information includes applying one or more rules to the aggregate data item. These one or more rules may form at least part of a ripple down rule knowledge system.

一実施形態では、情報を生成する工程は、派生属性を構成する（一つ以上の）データ項目の（一つ以上の）識別子を情報に含める工程を含む。 In one embodiment, generating the information includes including in the information the identifier (s) of the data item (s) that make up the derived attribute.

一実施形態では、情報を生成する工程は、データ項目を構成するデータ項目の値を情報に含める工程を含む。 In one embodiment, generating the information includes including in the information the values of the data items that make up the data items.

一実施形態では、情報を生成する工程は、情報内におけるデータ項目識別子の順序を決定する工程を含む。 In one embodiment, generating information includes determining the order of data item identifiers within the information.

一実施形態では、この方法は、情報の概念表現を構築する工程を更に含み、
概念表現は、
（ａ）複数のデータ項目、
（ｂ）一つ以上の派生属性、
（ｃ）一つ以上のルールの以前の反復において評価された一つ以上の結論
のうちの一つ以上の解析に基づいて決定支援システムのルールを評価することにより与えられる結論であり、結論の構築は、これらのルールの連続する再評価において行われる。 In one embodiment, the method further comprises building a conceptual representation of the information;
Conceptual expression is
(A) a plurality of data items;
(B) one or more derived attributes;
(C) a conclusion given by evaluating a rule of the decision support system based on one or more analyzes of one or more conclusions evaluated in a previous iteration of one or more rules, Construction is done in successive reevaluations of these rules.

一実施形態では、この方法は、結論を構築する工程を更に含み、結論の構築は、ルール評価の反復において行われ、連続する各ルール評価は、以前のルール評価において構築された結論を利用する。 In one embodiment, the method further includes building a conclusion, where the building of the conclusion is performed in an iteration of the rule evaluation, and each successive rule evaluation utilizes the conclusion built in the previous rule evaluation. .

一実施形態では、複数のデータ項目から情報を生成するシステムであって、そのシステムは、複数のデータ項目のうちの少なくとも一つを用いて集約データ項目をポピュレートする集約データ項目ポピュレータと、集約データ項目を使用して情報を生成する情報生成器とを備える。 In one embodiment, a system for generating information from a plurality of data items, the system comprising: an aggregate data item populator that populates the aggregate data item using at least one of the plurality of data items; and aggregate data And an information generator for generating information using the item.

一実施形態では、情報生成器は、テキスト情報を生成するテキスト情報生成器である。あるいは、情報生成器は、機械命令を生成する機械命令生成器である。 In one embodiment, the information generator is a text information generator that generates text information. Alternatively, the information generator is a machine instruction generator that generates machine instructions.

一実施形態では、このシステムは、知識ベース又は決定支援システムを備える。一実施形態では、システムは、複数のデータ項目を受け取るデータ項目受信機を備える。情報生成器は、構文法的及び／又は文法的に正しいテキスト情報を生成するように構成されていてもよい。情報生成器は、人間が読めるテキスト情報を生成するように構成されていてもよい。情報生成器は、符号化テキスト情報を生成するように構成されていてもよい。符号化テキスト情報は、機械命令であってもよい。 In one embodiment, the system comprises a knowledge base or decision support system. In one embodiment, the system comprises a data item receiver that receives a plurality of data items. The information generator may be configured to generate syntactically and / or grammatically correct text information. The information generator may be configured to generate human readable text information. The information generator may be configured to generate encoded text information. The encoded text information may be machine instructions.

一実施形態では、複数のデータ項目から情報を生成するシステムであって、そのシステムは、
（ａ）プリプロセッサであって、
ｉ．複数のデータ項目の少なくとも一つを用いて集約データ項目をポピュレートし、かつ
ｉｉ．複数のデータ項目から一つ以上の他の派生属性を構築する
プリプロセッサと、
（ｂ）集約データ項目を使用する派生属性及び他の派生属性を使用して情報を生成する情報生成器と
を備え、
情報生成器が、決定支援システムの少なくとも一部を成し、
集約データ項目が、派生属性の一形式であり、
一つの派生属性は、前記複数のデータ項目から一つ以上の上位概念が抽出され、それにより、情報を生成する際に、凝縮された量のより関連するデータを検討できるように、式を使用して前記複数のデータ項目から構築されるデータ項目であり、
生成された情報が、以下のグループ
ｉ．テキスト情報
ｉｉ．機械命令
のうちの一つ以上に属する。 In one embodiment, a system for generating information from a plurality of data items, the system comprising:
(A) a preprocessor,
i. Populate an aggregate data item with at least one of the plurality of data items; and ii. A preprocessor that constructs one or more other derived attributes from multiple data items;
(B) a derived attribute that uses the aggregated data item and an information generator that generates information using other derived attributes;
An information generator forms at least part of a decision support system;
The aggregate data item is a form of derived attribute,
A derived attribute uses an expression so that one or more superordinate concepts can be extracted from the data items so that a condensed amount of more relevant data can be considered when generating the information. A data item constructed from the plurality of data items,
The generated information is stored in the following groups i. Text information ii. Belongs to one or more of the machine instructions.

一実施形態では、情報生成器は、レポートの少なくとも一部を成すテキスト情報を生成するように構成されていてもよい。一実施形態では、集約データポピュレータは、複数のデータ項目のうちの少なくとも一つにルールを適用することによって集約データ項目をポピュレートするように構成されていてもよい。 In one embodiment, the information generator may be configured to generate text information that forms at least part of the report. In one embodiment, the aggregate data populator may be configured to populate the aggregate data item by applying a rule to at least one of the plurality of data items.

一実施形態では、情報生成器は、集約データ項目をポピュレートするデータ項目に関連付けられた名称又はラベルを情報に含めるように構成される。情報生成器は、集約データ項目をポピュレートするデータ項目に関連付けられた値を情報に含めるように構成されていてもよい。情報生成器は、テキスト内における名称又はラベルの順序を決定するように構成されていてもよい。 In one embodiment, the information generator is configured to include in the information a name or label associated with the data item that populates the aggregate data item. The information generator may be configured to include in the information values associated with the data items that populate the aggregated data items. The information generator may be configured to determine the order of names or labels within the text.

一実施形態では、システムは、解釈部分を含む情報の概念表現を構築するビルダー（ｂｕｉｌｄｅｒ）を更に備えており、解釈部分は、複数のデータ項目を含む集約データ項目に対する演算を表す。 In one embodiment, the system further comprises a builder that builds a conceptual representation of information that includes an interpretation portion, wherein the interpretation portion represents an operation on an aggregate data item that includes a plurality of data items.

一実施形態では、情報生成器は、集約データ項目の特性を決定するように構成される。情報生成器は、集約データ項目を構成するデータ項目の数を決定すること、集約データ項目が空かどうかを決定すること、及び集約データ項目が特定のデータ項目を含むかどうかを決定することのうちの一つ以上を含むように構成されていてもよい。 In one embodiment, the information generator is configured to determine characteristics of the aggregate data item. The information generator is capable of determining the number of data items that make up the aggregate data item, determining whether the aggregate data item is empty, and determining whether the aggregate data item includes a particular data item. You may be comprised so that one or more of them may be included.

一実施形態では、情報生成器は、集約データ項目の決定された特性を情報に含めるように構成される。 In one embodiment, the information generator is configured to include the determined characteristic of the aggregate data item in the information.

一実施形態では、集約データポピュレータは、一つ以上の他の集約データ項目を用いて集約データ項目をポピュレートするように構成される。集約データポピュレータは、集約データ項目に関連付けられた集約データ項目名をテキストに含めるように構成されていてもよい。集約データポピュレータは、テキスト内における集約データ項目名の順序を決定するように構成されていてもよい。 In one embodiment, the aggregate data populator is configured to populate the aggregate data item with one or more other aggregate data items. The aggregate data populator may be configured to include in the text the aggregate data item name associated with the aggregate data item. The aggregate data populator may be configured to determine the order of aggregate data item names in the text.

一実施形態では、集約データ項目ポピュレータは、二つの他の集約データ項目に演算を施すように構成される。 In one embodiment, the aggregate data item populator is configured to operate on two other aggregate data items.

一実施形態では、集約データ項目ポピュレータは、別の集約データ項目を構成するどのデータ項目が特定の範囲内の値を有するかを決定するように構成される。一実施形態では、情報生成器は、集約データ項目に一つ以上のルールを適用するように構成される。 In one embodiment, the aggregate data item populator is configured to determine which data items that make up another aggregate data item have a value within a particular range. In one embodiment, the information generator is configured to apply one or more rules to the aggregate data item.

一実施形態では、情報生成器は、検査が実行された局所特性を考慮するように構成される。 In one embodiment, the information generator is configured to take into account the local characteristics for which the inspection was performed.

一実施形態では、複数のデータ項目から情報を生成する方法が提供され、その方法は、各々がデータ項目の一つ以上を含む一つ以上の集約データ項目を使用して、一つ以上のルールの結果を評価する工程と、その結果に従って情報を生成する工程とを含む。 In one embodiment, a method for generating information from a plurality of data items is provided, the method comprising one or more rules using one or more aggregated data items, each including one or more of the data items. And evaluating the result, and generating information according to the result.

一実施形態では、情報はテキスト情報を含む。あるいは、情報は機械命令を含む。 In one embodiment, the information includes text information. Alternatively, the information includes machine instructions.

一実施形態では、一つ以上のルールの結果を評価する工程は、これらのルールのうちの一つ以上のための基礎として、集約データ項目の特性を使用する工程を含む。 In one embodiment, evaluating the results of one or more rules includes using the characteristics of the aggregate data item as a basis for one or more of these rules.

一実施形態では、複数のデータ項目から情報を生成するシステムが提供される。そのシステムは、各々が複数のデータ項目の一つ以上を含む一つ以上の集約データ項目を使用して一つ以上のルールの結果を評価する評価器と、該結果に従って情報を生成する情報生成器とを備える。 In one embodiment, a system for generating information from a plurality of data items is provided. The system includes an evaluator that evaluates the result of one or more rules using one or more aggregated data items each including one or more of a plurality of data items, and information generation that generates information according to the results. With a vessel.

一実施形態では、システムは、知識ベース又は決定支援システムを備える。 In one embodiment, the system comprises a knowledge base or decision support system.

一実施形態では、情報を生成する方法が提供され、その方法は、解釈部分を含む情報の概念表現を受け取る工程であって、解釈部分が、複数のデータ項目を含む集約データ項目に対する演算を表す工程と、解釈部分から情報を生成する工程とを含む。 In one embodiment, a method for generating information is provided, the method receiving a conceptual representation of information that includes an interpretation portion, wherein the interpretation portion represents an operation on an aggregate data item that includes a plurality of data items. And generating information from the interpretation part.

一実施形態では、情報はテキスト情報である。あるいは、情報は機械命令である。 In one embodiment, the information is text information. Alternatively, the information is a machine instruction.

一実施形態では、情報を生成する工程は、データ項目の各々に関連付けられた一つ以上の名称又はラベルを情報に含める工程を含む。情報を生成する工程は、複数のデータ項目の集団名を情報に含める工程を含んでいてもよい。一つ以上の名称又はラベルを含める工程は、情報をテキストの概念表現のリテラル部分と統合する工程を含んでいてもよい。 In one embodiment, generating information includes including in the information one or more names or labels associated with each of the data items. The step of generating information may include a step of including a group name of a plurality of data items in the information. Including one or more names or labels may include integrating information with a literal portion of a conceptual representation of the text.

一実施形態では、概念表現は疑似テキストである。 In one embodiment, the conceptual representation is pseudo text.

一実施形態では、情報を生成する工程は、構文法的及び／又は文法的に正しいテキスト情報を生成する工程を含んでいてもよい。 In one embodiment, generating the information may include generating syntactically and / or grammatically correct text information.

一実施形態では、情報を生成するシステムが提供され、そのシステムは、解釈部分を含む情報の概念表現を受け取る受信機であって、解釈部分が、複数のデータ項目を含む集約データ項目に対する演算を表す、受信機と、解釈部分から情報を生成する情報生成器とを備える。 In one embodiment, a system for generating information is provided, wherein the system is a receiver that receives a conceptual representation of information that includes an interpretation portion, wherein the interpretation portion operates on an aggregate data item that includes a plurality of data items. A receiver, and an information generator for generating information from the interpretation part.

一実施形態では、情報はテキスト情報であり、その場合、情報生成器はテキスト情報生成器である。あるいは、情報は機械命令であり、その場合、情報生成器は機械命令生成器である。 In one embodiment, the information is text information, in which case the information generator is a text information generator. Alternatively, the information is a machine instruction, in which case the information generator is a machine instruction generator.

一実施形態では、生成器は、データ項目の各々に関連付けられた一つ以上の名称又はラベルを情報に含めるように構成される。生成器は、複数のデータ項目の集団名を情報に含めるように構成されていてもよい。生成器は、情報をテキストの概念表現のリテラル部分と統合するように構成されていてもよい。 In one embodiment, the generator is configured to include in the information one or more names or labels associated with each of the data items. The generator may be configured to include a group name of a plurality of data items in the information. The generator may be configured to integrate information with the literal part of the conceptual representation of the text.

一実施形態では、生成器は、構文法的及び／又は文法的に正しいテキスト情報を生成するように構成されていてもよい。 In one embodiment, the generator may be configured to generate syntactically and / or grammatically correct text information.

一実施形態では、本発明は、本発明の第１の態様に係る方法を実施するようにコンピュータを制御する命令を含むコンピュータプログラムを提供する。 In one embodiment, the present invention provides a computer program comprising instructions for controlling a computer to perform the method according to the first aspect of the present invention.

一実施形態では、本発明は、本発明の第７の態様に係るコンピュータプログラムを提供するコンピュータ可読媒体を提供する。 In one embodiment, the present invention provides a computer readable medium providing a computer program according to the seventh aspect of the present invention.

一実施形態では、本発明は、本発明の第３の態様に係る方法を実施するようにコンピュータを制御する命令を含むコンピュータプログラムを提供する。 In one embodiment, the present invention provides a computer program comprising instructions for controlling a computer to implement a method according to the third aspect of the present invention.

一実施形態では、本発明は、本発明の第９の態様に係るコンピュータプログラムを提供するコンピュータ可読媒体を提供する。 In one embodiment, the present invention provides a computer readable medium providing a computer program according to the ninth aspect of the present invention.

一実施形態では、本発明は、本発明の第５の態様に係る方法を実施するようにコンピュータを制御する命令を含むコンピュータプログラムを提供する。 In one embodiment, the present invention provides a computer program comprising instructions for controlling a computer to implement a method according to the fifth aspect of the present invention.

一実施形態では、本発明は、本発明の第１１の態様に係るコンピュータプログラムを提供するコンピュータ可読媒体を提供する。本明細書における用語「サーバ」は、クライアント−サーバアーキテクチャの一部において、接続されたクライアントに対するサービスを実行する、ハードウェアとソフトウェアの任意の組合せを包含することが意図されている。 In one embodiment, the present invention provides a computer readable medium providing a computer program according to the eleventh aspect of the present invention. The term “server” herein is intended to encompass any combination of hardware and software that, in part of the client-server architecture, performs services for connected clients.

クライアントとサーバは、単一のハードウェア又は複数の接続されたハードウェア上で動作する別個のソフトウェアとすることができる。 The client and server can be a single piece of hardware or separate software running on multiple connected hardware.

したがって、好ましい一実施形態では、本発明は、異なるソースから取得される数値データ及びテキストデータを含む大量の複雑なデータを解釈することが可能な手段を提供することによって、従来のエキスパートシステムの制限の少なくとも幾つかを克服する、テキスト（テキストレポートなど）を生成するコンピュータ応用方法及びシステムを提供する。一実施形態では、本発明は、自由形式テキストを含む様々な形式で提示されるデータを解釈する手段を更に提供する。 Thus, in a preferred embodiment, the present invention limits the limitations of conventional expert systems by providing a means that can interpret large amounts of complex data, including numeric and textual data obtained from different sources. A computer application method and system for generating text (such as a text report) that overcomes at least some of the above. In one embodiment, the present invention further provides a means for interpreting data presented in various formats including free-form text.

本発明の本質のより良い理解を達成するため、以下では、テキスト情報を生成する方法及びシステムの実施形態を、添付の図面及び実施例を参照しながら、あくまで例示として説明する。 To achieve a better understanding of the nature of the present invention, embodiments of a method and system for generating text information are described below by way of example only with reference to the accompanying drawings and examples.

ここで、例１は、白血病レポート知識ベースの形態をとる好ましい一実施形態に係るテキストを生成する方法及びシステムの一例である。 Here, Example 1 is an example of a method and system for generating text according to a preferred embodiment in the form of a leukemia report knowledge base.

例２は、アレルギーレポート知識ベースの形態をとる好ましい一実施形態に係るテキストを生成する方法及びシステムの別の例である。 Example 2 is another example of a method and system for generating text according to a preferred embodiment that takes the form of an allergy report knowledge base.

例３は、航空券発券監査システムの形態をとる他の実施形態に係るテキストを生成する方法及びシステムの一例である。 Example 3 is an example of a method and system for generating text according to another embodiment that takes the form of an airline ticketing audit system.

例４は、ログファイル監視システムの形態をとる他の実施形態に係るテキストを生成する方法及びシステムの一例である。 Example 4 is an example of a method and system for generating text according to another embodiment taking the form of a log file monitoring system.

テキストのブロックとテキスト正規化属性「ＮｏｒｍＣａｔ」を使用したその「正規形」の一例を示すユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。It is a user interface window showing an example of a “normal form” using a text block and a text normalization attribute “NormCat”. The example shown relates to ticketing. 「ＮｏｒｍＣａｔ」におけるキータームのリストと各キータームを定義する正規表現の一例を示すユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。It is a window of a user interface showing an example of a regular expression defining a list of key terms and each key term in “NormCat”. The example shown relates to ticketing. 正規化テキストから通貨及び値を抽出した変数を有するコメントの二つの例を示すユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。FIG. 5 is a user interface window showing two examples of comments with variables that extract currency and value from normalized text. The example shown relates to ticketing. テキストレポートなどのテキスト又はテキスト情報を生成するシステムの一実施形態のブロック図である。1 is a block diagram of one embodiment of a system for generating text or text information, such as a text report. テキストレポートなどのテキスト又はテキスト情報を生成する方法の一実施形態のフロー図である。FIG. 3 is a flow diagram of one embodiment of a method for generating text or text information, such as a text report. テキストレポートなどのテキスト又はテキスト情報を生成するシステムの別の実施形態のブロック図である。FIG. 6 is a block diagram of another embodiment of a system for generating text or text information, such as a text report. テキストレポートなどのテキスト又はテキスト情報を生成する方法の別の実施形態のフロー図である。FIG. 6 is a flow diagram of another embodiment of a method for generating text or text information, such as a text report. テキストレポートなどのテキスト又はテキスト情報を生成する方法のまた別の実施形態のフロー図である。FIG. 6 is a flow diagram of yet another embodiment of a method for generating text or text information, such as a text report. テキストレポートなどのテキスト又はテキスト情報を生成する方法の第３の実施形態のフロー図である。FIG. 6 is a flow diagram of a third embodiment of a method for generating text or text information such as a text report. ある実施形態の、キータームを定義するテキスト要約器属性（ＴＣＡ：ｔｅｘｔｃｏｎｄｅｎｓｅｒＡｔｔｒｉｂｕｔｅ）の一例を示すユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。FIG. 4 is a user interface window illustrating an example of a text condenser attribute (TCA) that defines a key term of an embodiment. FIG. The example shown relates to ticketing. キータームとともにキーコンセプトも定義される図１０の実施形態に係るテキスト要約器属性（ＴＣＡ）の一例を示すユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。FIG. 11 is a user interface window illustrating an example of a text summary attribute (TCA) according to the embodiment of FIG. 10 in which a key concept is defined along with a key term. The example shown relates to ticketing. 自ら（「ＴＣＡ」）についての値とキーコンセプト「ＣｘＢｔ」、「ＣｘＡｔ」、「ＲｉＯｂ１」及び「ＲｉＲｔｃ」についての値をサンプルケースに与える図１０のテキスト要約器属性（ＴＣＡ）のユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。User interface window for text summarizer attribute (TCA) in FIG. 10 that gives sample cases values for themselves (“TCA”) and values for key concepts “CxBt”, “CxAt”, “RiOb1” and “RiRtc” It is. The example shown relates to ticketing. キーコンセプトの評価を定める照合用形式（ＭａｔｃｈｉｎｇＦｏｒｍ）の一例を示す図１０のＴＣＡのユーザインタフェースのウィンドウである。ユーザは、各照合用形式のための生テキストの一例を提供するように促される。示される例は、航空券発券に関係する。FIG. 11 is a user interface window of the TCA in FIG. 10 showing an example of a matching form that defines evaluation of a key concept. The user is prompted to provide an example of raw text for each matching format. The example shown relates to ticketing. 派生マッチ（ＤｅｒｉｖｅｄＭａｔｃｈ）のための照合用形式の一例を示す図１０のＴＣＡのユーザインタフェースのウィンドウである。キーワードが追加されたため、照合用形式は、もはやそれらの例と一致しない。示される例は、航空券発券に関係する。FIG. 11 is a window of the user interface of the TCA of FIG. 10 showing an example of a matching format for a derived match. Because keywords have been added, the matching format no longer matches those examples. The example shown relates to ticketing. 新しいキーワードが追加された場合、照合用形式がそれらの例の正規化バージョンと一致するように、照合用形式をどのように変更する必要があるかを示す図１０のＴＣＡのユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。In the TCA user interface window of FIG. 10 showing how the matching format needs to be changed so that the matching format matches the normalized version of those examples when new keywords are added. is there. The example shown relates to ticketing. コメント内の変数においてキーコンセプトをどのように直接使用できるかを示す図１０のＴＣＡのユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。FIG. 11 is a window of the TCA user interface of FIG. 10 showing how the key concept can be used directly on the variables in the comments. The example shown relates to ticketing. 「ＢＴ」から「ＢｅｆｏｒｅＴｒａｖｅｌ」へのキーワードの名称変更が照合用形式をどのように自動的に更新するかを示す図１０のＴＣＡのユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。FIG. 11 is a TCA user interface window of FIG. 10 showing how a keyword name change from “BT” to “BeforeTravel” automatically updates the collation format. The example shown relates to ticketing. 日付及びブール値を抽出するＴＣＡの一実施形態のユーザインタフェースのウィンドウである。Figure 5 is a user interface window of one embodiment of a TCA that extracts dates and Boolean values. キーコンセプトのための例示的なブール値及び日付値を示す図１８の実施形態のユーザインタフェースのウィンドウである。示される例は、航空券発券に関係する。FIG. 19 is a user interface window of the embodiment of FIG. 18 showing exemplary Boolean and date values for the key concept. FIG. The example shown relates to ticketing. 派生属性及びそれが表す生テキストの一部がツールのヒント（ｔｏｏｌｔｉｐ）としてユーザに提供される、本発明に係るＴＣＡの一実施形態のユーザインタフェースの例示的なウィンドウである。FIG. 4 is an exemplary window of a user interface of an embodiment of a TCA according to the present invention in which a derived attribute and a portion of the raw text it represents are provided to a user as a tooltip. データ項目と集約データ項目の階層関係の一実施形態の概略図である。FIG. 6 is a schematic diagram of an embodiment of a hierarchical relationship between data items and aggregated data items.

表１は、本発明に従って定義された用語の辞書である。表１で定義された用語は、本文書の全体にわたって大文字を使用して表記される。用語に大文字が使用されていない場合は、別途指摘がない限り、その通常の意味で解釈されたい。 Table 1 is a dictionary of terms defined according to the present invention. The terms defined in Table 1 are shown using capital letters throughout this document. Where capital letters are not used in a term, they should be interpreted in their ordinary meaning unless otherwise indicated.

好ましい一実施形態では、本発明は、異なるソースから得られる数値データ及びテキストデータを含む大量の複雑なデータを解釈することが可能な手段を提供することによって、従来のエキスパートシステムの制約の少なくとも一部を解消する、情報（テキストのレポート（ＴｅｘｔｕａｌＲｅｐｏｒｔ）など）を生成するコンピュータ応用方法及びシステムを提供する。一実施形態では、本発明は、自由形式テキストで提示されるデータの解釈を可能にする自由テキスト解析手段を含む、様々な形式で提示されるデータを解釈する手段を更に提供する。 In a preferred embodiment, the present invention provides at least one of the limitations of conventional expert systems by providing a means capable of interpreting large amounts of complex data including numerical and textual data obtained from different sources. The present invention provides a computer application method and system for generating information (such as a text report) that eliminates a part. In one embodiment, the present invention further provides means for interpreting data presented in various forms, including free text analysis means that allow interpretation of data presented in free form text.

図４は、複数のデータ項目からテキストを生成するシステムの一実施形態のブロック図であり、番号１によって全体的に示される。システム１は、情報を処理できる任意のシステムを構成していてもよく、この実施形態では、コンピュータ可読媒体２上に存在し、システムの中央プロセッサ４を制御するための命令を含むコンピュータプログラムを有するコンピュータシステム１として説明することができる。これらの命令は、人間が読めるテキストのレポート内の情報などのテキスト情報を複数のデータ項目から生成する方法５００を実施するためのものである。方法５００のフロー図が、図５に示されている。 FIG. 4 is a block diagram of one embodiment of a system for generating text from a plurality of data items, indicated generally by the number 1. The system 1 may comprise any system capable of processing information, and in this embodiment has a computer program that resides on the computer readable medium 2 and includes instructions for controlling the central processor 4 of the system. It can be described as a computer system 1. These instructions are for implementing a method 500 for generating text information, such as information in a human readable text report, from a plurality of data items. A flow diagram of the method 500 is shown in FIG.

あるいは、生成される情報は、レポートとして提示されるテキスト情報ではなく、一つ以上の機械命令であり、システム１の構成要素は、これに応じて変更される。「テキスト情報（ｔｅｘｔｕａｌｉｎｆｏｒｍａｔｉｏｎ）」という用語は、これ以降、適切な場合は、この代替実施形態を包含するように、より広い意味に解釈されるべきであることを理解されたい。 Alternatively, the generated information is not text information presented as a report, but one or more machine instructions, and the components of the system 1 are changed accordingly. It should be understood that the term “textual information” should be construed in a broader sense from now on to encompass this alternative embodiment, where appropriate.

図４を参照すると、コンピュータ可読媒体２は、ＳＣＳＩなどの適切なバス６によってプロセッサ４に接続されるハードドライブディスク２の形態を取る不揮発性メモリ２を含む。幾つかの実施形態では、不揮発性メモリ２は、例えば、フラッシュメモリ、ＣＤ、ＤＶＤ、又はＵＳＢフラッシュメモリユニットを含む。 Referring to FIG. 4, a computer readable medium 2 includes a non-volatile memory 2 in the form of a hard drive disk 2 connected to a processor 4 by a suitable bus 6 such as SCSI. In some embodiments, the non-volatile memory 2 includes, for example, a flash memory, CD, DVD, or USB flash memory unit.

２．一つ以上のデータ項目８は、それらのデータ項目の出所である他のシステム又はユーザとの通信インタフェースの一部であるデータ受信機１０を介して受け取られる。受信機は、
（ａ）データ項目
（ｂ）派生属性
（ｃ）結論
のうちの一つ以上を受け取ることができる。 2. One or more data items 8 are received via a data receiver 10 that is part of the communication interface with the other system or user from which the data items originate. The receiver
(A) Data items (b) Derived attributes (c) One or more of the conclusions can be received.

各データ項目８は、処理すべき入力データ、例えば、ある調査の一つ以上の検査の結果や、処理を必要とする他の任意の単純な又は複雑なデータを表す。 Each data item 8 represents input data to be processed, such as the results of one or more examinations of a survey, or any other simple or complex data that requires processing.

このような処理は、解釈部分を含む情報の概念表現を構成するビルダー（ｂｕｉｌｄｅｒ）によって実行することが可能であり、ここで、解釈部分は、複数のデータ項目を含む集約データ項目に対する演算を表現する。幾つかの実施形態では、これらのデータ項目のソースは、１の外部にある情報システム３７である。 Such processing can be executed by a builder that constructs a conceptual representation of information including an interpretation part, where the interpretation part represents an operation on an aggregate data item including a plurality of data items. To do. In some embodiments, the source of these data items is an information system 37 that is external to one.

生成されたテキスト情報２６は、テキスト情報を必要とする他のシステム又は受取人（１人以上のユーザなど）との通信インタフェースの一部であるデータ送信機１１（送信機と呼ばれる）を介して送られる。幾つかの実施形態では、テキスト情報の送り先は、１の外部にある情報システム３７である。送信機は、生成された情報を、
（ａ）機械
（ｂ）受取人
のうちの一つ以上に送る。 The generated text information 26 is via a data transmitter 11 (referred to as a transmitter) that is part of a communication interface with other systems or recipients (such as one or more users) that require text information. Sent. In some embodiments, the text information destination is an information system 37 external to one. The transmitter sends the generated information
(A) Machine (b) Send to one or more of the recipients.

図６に示されるような幾つかの実施形態では、システム３は、組み込みシステムである。図４のシステム１の構成要素と同様の図６の構成要素は、同様に番号がふられる。この実施形態の組み込みシステム３は、医学的検査などの検査を実施する機器の一部を成す。図示されたアーキテクチャばかりでなく、端末／メインフレーム、クライアント／サーバ、クラウドコンピューティングなどの任意の適切なアーキテクチャを使用できることを理解されたい。 In some embodiments as shown in FIG. 6, the system 3 is an embedded system. The components of FIG. 6 that are similar to the components of system 1 of FIG. 4 are similarly numbered. The embedded system 3 in this embodiment forms part of a device that performs a test such as a medical test. It should be understood that any suitable architecture can be used, such as a terminal / mainframe, client / server, cloud computing, etc., as well as the illustrated architecture.

図４及び図６に示される実施形態では、コンピュータ可読媒体（例えばハードドライブ）２は、集約データ項目、他の派生属性（ＤｅｒｉｖｅｄＡｔｔｒｉｂｕｔｅｓ）、及びテキスト情報を生成するためのルール、を定めるためのコンピュータ命令を保有する。 In the embodiment shown in FIGS. 4 and 6, the computer readable medium (eg hard drive) 2 is for defining aggregate data items, other derived attributes, and rules for generating text information. Holds computer instructions.

概略的に言えば、「派生属性」は、解釈のために知識ベースに提示された原データ内には存在しないが、何らかの式を使用してこのデータから構築されるデータ項目である。この式は、以下の考慮事項
（ａ）データ項目又は他の派生属性の識別子
（ｂ）データ項目又は派生属性の値
（ｃ）以下の中で提示されるデータ項目又は派生属性の複数のインスタンス
ｉ．時間ベースのシーケンス
ｉｉ．他の任意のベクトルフォーマット
のいずれかに基づいている。 Generally speaking, a “derived attribute” is a data item that does not exist in the original data presented to the knowledge base for interpretation, but is constructed from this data using some expression. This formula is based on the following considerations: (a) the identifier of the data item or other derived attribute (b) the value of the data item or derived attribute (c) multiple instances of the data item or derived attribute presented below i . Time-based sequence ii. Based on any other vector format.

集約データ項目は、派生属性の一例である。原（又は「プライマリ」）データは、データ項目から値ペアへのマッピングとして提示される。病歴データが検討される場合、原データは、データ項目からそのデータ項目に関する値の時間ベースシーケンスへのマップとして提示される。原データ内のデータ項目は、「プライマリ属性」と呼ばれる。 The aggregate data item is an example of a derived attribute. The original (or “primary”) data is presented as a mapping from data items to value pairs. When historical data is considered, the original data is presented as a map from a data item to a time-based sequence of values for that data item. A data item in the original data is called a “primary attribute”.

派生属性は、ルール及びレポート内でプライマリ属性よりも自然で汎用的に使用できる、より上位の概念である。例えば、プライマリ属性は、委託医師の名前であってもよい。より有用な派生属性は、委託医師の名前が専門医師リスト上の名前と一致した場合に値「真」を有する派生属性「専門医」であってもよい。別の例は、プライマリ属性が患者の身長及び患者の体重の場合である。有用な派生属性は、身長の２乗に対する体重の比として評価される数値から構成される属性「ＢＭＩ」であってもよい。 Derived attributes are a higher level concept that can be used more naturally and generically than primary attributes in rules and reports. For example, the primary attribute may be the name of the referring doctor. A more useful derived attribute may be a derived attribute “specialist” having the value “true” if the name of the referring physician matches a name on the specialist physician list. Another example is when the primary attributes are patient height and patient weight. A useful derived attribute may be an attribute “BMI” consisting of a numerical value evaluated as a ratio of weight to square of height.

図４及び図６を参照すると、システム１は、図５及び図７に示されるテキストを生成する方法を実行することによってデータ項目８を処理するように構成される。この他に、データ項目８は、不動産査定など、任意の専門分野に関係していてもよい。不動産「検査」又は評価についてのデータ項目８は、例えば、家屋及び土地の広さ、家屋の向き、付近の郵便番号及び最近の査定、又は他の比較し得る不動産を含む。専門分野の他の例には、不正検出、骨ミネラル濃度レポート、メディカル警報、ゲノムレポート、分子レポート、及びアレルギーレポートのうちの一つ以上が含まれる。本明細書で説明されるシステム１、３及び方法５００は、そのようなデータ項目８を前処理するように構成されていてもよい。 With reference to FIGS. 4 and 6, the system 1 is configured to process the data items 8 by performing the method of generating text shown in FIGS. In addition, the data item 8 may be related to an arbitrary specialized field such as real estate assessment. Data items 8 for real estate “inspection” or evaluation include, for example, house and land size, house orientation, nearby postal code and recent assessment, or other comparable real estate. Other examples of specialties include one or more of fraud detection, bone mineral concentration reports, medical alerts, genomic reports, molecular reports, and allergy reports. The systems 1, 3 and method 500 described herein may be configured to pre-process such data items 8.

図４の例示的な実施形態では、システム１は、データ項目８を受け取るデータ受信機１０を有し、データ項目８は、その後、ハードドライブ（又は他のコンピュータ可読媒体）２上に保存されてもよく、又は保存されなくてもよい。検査がシステム１から離れた場所で、例えば、リモートサイト１２で行われる実施形態では、システム１は、リモートサイト１２も接続されるネットワーク１４に接続されるように構成されていてもよい。ネットワーク１４は、インターネット又はクラウドなどのワイドエリアネットワークであってもよいが、リモートサイト１２は、例えば、システム１の隣の部屋など、きわめて近くにあってもよく、その場合、ネットワーク１４は、ローカルエリアネットワーク又はＷｉＦｉやＷＬＡＮなどのワイヤレスネットワークとすることができる。この他に、システム３が検査機器５の一部である図６に示される例では、データ受信機１０は、プロセッサ４とデータソース２２との間のインタフェースとして動作してもよい。データソース２２の例は、システム３のサンプル検査装置であり、この装置は、サンプルに対する物理的、化学的、生物学的検査又は他の分析を実行する。 In the exemplary embodiment of FIG. 4, the system 1 has a data receiver 10 that receives a data item 8 that is then stored on a hard drive (or other computer readable medium) 2. Or may not be stored. In embodiments where testing is performed at a location remote from the system 1, for example, at a remote site 12, the system 1 may be configured to be connected to a network 14 to which the remote site 12 is also connected. The network 14 may be a wide area network such as the Internet or the cloud, but the remote site 12 may be very close, for example, a room next to the system 1, in which case the network 14 is local It can be an area network or a wireless network such as WiFi or WLAN. In addition, in the example shown in FIG. 6 in which the system 3 is a part of the inspection device 5, the data receiver 10 may operate as an interface between the processor 4 and the data source 22. An example of a data source 22 is the system 3 sample inspection device, which performs a physical, chemical, biological inspection or other analysis on the sample.

プロセッサ４（図４及び図６）は、ハードドライブ２（又は他のコンピュータ可読媒体）上に保存された複数のデータ項目８のうちの少なくとも一つを用いて集約データ項目２４をポピュレート（データ投入）する集約データ項目ポピュレータとしてプログラムされる。一実施形態では、集約データ項目２４は、プロセッサ４による処理のためのメモリ２０のある種のデータ構造（例えば、ファイル、リスト、配列、ツリー、レコード、データベースで使用されるテーブル、フラットファイル、又は索引システムなど、任意の形式の適切なデータ構造）である。データ項目８は、メモリ２０内にも保存することができる。この実施形態におけるメモリ２０は、ＣＰＵレジスタ、オンダイ８ＲＡＭキャッシュ、外部キャッシュ、ＤＲＡＭ、及び／又はページングシステム、ハードドライブ（若しくは他のコンピュータ可読媒体）２上の仮想メモリ若しくはスワップ空間、或いは他の任意のタイプのメモリを含む。しかし、実施形態は、適切な場合は、より多くの又はより少ないメモリタイプを有していてもよい。 The processor 4 (FIGS. 4 and 6) populates the aggregated data item 24 using at least one of a plurality of data items 8 stored on the hard drive 2 (or other computer readable medium). Programmed as an aggregate data item populator. In one embodiment, the aggregated data item 24 is a data structure of some kind of memory 20 for processing by the processor 4 (eg, file, list, array, tree, record, table used in a database, flat file, or Any form of appropriate data structure, such as an indexing system). Data item 8 can also be stored in memory 20. The memory 20 in this embodiment may be a CPU register, on-die 8 RAM cache, external cache, DRAM, and / or paging system, virtual memory or swap space on the hard drive (or other computer readable medium) 2, or any other Includes types of memory. However, embodiments may have more or fewer memory types where appropriate.

プロセッサ４は、集約データ項目２４を使用して情報２６を（例えば、テキストレポートとして、又は一つ以上の機械命令として）生成する情報生成器になるようにプログラムされる。情報生成器４は、そのように生成された情報２６をメモリ２０内に保存するように構成される。この実施形態では、テキスト情報２６は、構文法的及び／又は文法的に正しい、人間が読めるテキストを表す。 The processor 4 is programmed to be an information generator that uses the aggregated data items 24 to generate information 26 (eg, as a text report or as one or more machine instructions). The information generator 4 is configured to store the information 26 so generated in the memory 20. In this embodiment, text information 26 represents syntactically and / or grammatically correct human readable text.

システム１、３の出力はテキストの情報であり、好ましくは人間が読める形式のもの、例えば、モニタ又は画面２８に印字されるテキスト（例えばテキストレポート）、プリンタ３０によって紙のレポート３３に印刷されるテキスト、データ送信機１１によってネットワーク１４を介してユーザのワークステーション３２（例えば内科医や外科医のコンピュータ）又は別の情報システム３７に送信される電子メール又は他のタイプの電子メッセージ３４のうちの一つ以上である。プロセッサ４によって生成されるテキスト情報は、データ項目８から導出される他の何らかの決定支援結果などのテキスト情報であってもよい。一実施形態では、ＳＭＳゲートウェイ（又は他のＳＭＳ中継機構）３４は、人間が読める形式のテキスト情報２６（すなわち構文法的及び／又は文法的に正しいテキスト）を含むＳＭＳ又は電子メールなどの電子メッセージを、電子デバイスなどの受信機３６に送信するように、システム１によって命令される。デバイス３６は、モバイル電話、スマートフォン、ＰＤＡ、又は他のハンドヘルド電子デバイス、処理能力を有する他の任意のコンピューティングデバイスとすることができる。一実施形態では、システム１は、ＳＭＳをハンドヘルドモバイルデバイス３６に送信する命令を送るように構成される。これは、検査結果が異常であり、即座のフォローアップを必要とする場合、又は検査の結果が迅速に求められる場合（例えば、航空券を検査する場合）に有利である。 The output of the system 1, 3 is textual information, preferably in human readable form, for example text printed on a monitor or screen 28 (eg text report), printed on a paper report 33 by the printer 30. One of the text or email or other type of electronic message 34 sent by the data transmitter 11 over the network 14 to the user's workstation 32 (eg, a physician or surgeon's computer) or another information system 37 More than one. The text information generated by the processor 4 may be text information such as some other decision support result derived from the data item 8. In one embodiment, the SMS gateway (or other SMS relay mechanism) 34 is an electronic message, such as an SMS or email, that includes text information 26 in human readable form (ie, syntactically and / or grammatically correct text). Is transmitted by the system 1 to a receiver 36 such as an electronic device. Device 36 may be a mobile phone, smartphone, PDA, or other handheld electronic device, any other computing device with processing capabilities. In one embodiment, the system 1 is configured to send an instruction to send an SMS to the handheld mobile device 36. This is advantageous when the test results are abnormal and require immediate follow-up, or when the test results are sought quickly (eg, when inspecting an air ticket).

図５を参照すると、複数のデータ項目からテキストレポート内の情報などのテキストを生成する方法５００の一実施形態が示されている。集約データポピュレータとして動作するプロセッサ４（図４及び図６）は、集約データ項目２４にデータを投入するようにプログラムされる。図７を参照すると、テキストを生成する方法の別の実施形態が示されている。この方法は、複数の集約データ項目のうちの少なくとも一つに一つ以上のルールを適用することによって、（図４及び図６でラベル２４を付された）集約データ項目をポピュレートする副工程を含む。 Referring to FIG. 5, one embodiment of a method 500 for generating text, such as information in a text report, from a plurality of data items is shown. The processor 4 (FIGS. 4 and 6) operating as an aggregate data populator is programmed to populate the aggregate data item 24. Referring to FIG. 7, another embodiment of a method for generating text is shown. This method includes a sub-step of populating an aggregate data item (labeled 24 in FIGS. 4 and 6) by applying one or more rules to at least one of the multiple aggregate data items. Including.

ルールは、ルールベースの知識／エキスパートシステム又は決定エンジンの少なくとも一部を形成してもよい。適切なルール知識システムの一例は、参照により本明細書に組み込まれる、出願人の米国特許第６５５３３６１号の明細書で開示されるリップルダウン（ＲｉｐｐｌｅＤｏｗｎ）として知られる独自仕様システムである。その米国明細書で説明されるように、ルールの集まりが、専門家によって構築される知識ベースである。ルールは、分野に特有であってもよい。例えば、ルールは、アレルギー検査の分野に、又は白血病検査の分野に特有であってもよい。しかし、幾つかの他の例では、ルールは、ケースに特有のルールであり、すなわち、一組の関連する検査結果／データ項目８に特有のルールである。この場合、システム１は、知識ベース又は決定支援システムである。 The rules may form at least part of a rule-based knowledge / expert system or decision engine. One example of a suitable rule knowledge system is a proprietary system known as RippleDown disclosed in the specification of applicant's US Pat. No. 6,553,361, incorporated herein by reference. As described in that US specification, a collection of rules is a knowledge base built by experts. The rules may be specific to the field. For example, the rules may be specific to the field of allergy testing or to the field of leukemia testing. However, in some other examples, the rules are case specific rules, i.e. rules specific to a set of associated test results / data items 8. In this case, the system 1 is a knowledge base or a decision support system.

図４及び図６を参照すると、一つのケースでは、データ項目８は、関連する名称又はラベルの部分と、値の部分とを有し、例えば、
ミルク，２５
大豆，３０
ピーナッツ，０
のようになる。 4 and 6, in one case, the data item 8 has an associated name or label portion and a value portion, for example,
Milk, 25
Soybean, 30
Peanuts, 0
become that way.

データ項目８の各々は、識別子（ここでは、ミルク、大豆、又はピーナッツ）と、値（ここでは、２５、３０、又は０）に関連付けられる。これらの実施形態では、データ項目８の各々は、識別子と値を含む。識別子は、この例では、テキストの情報２６を生成（例えば、図５の工程５０４を参照）するために使用できるデータ項目の名称又はラベル（例えば「ミルク」）である。システムは、この識別子を、指定された
（ａ）言語（生成される情報が人間に可読な場合）、又は
（ｂ）命令フォーマット（生成される情報が機械命令である場合）
に翻訳及び／又は表現することが可能なルールを含む。これは、生成される情報が、状況に適した言語／フォーマットで、又はルールによって他に決定されるように、生成されることを可能にする。 Each data item 8 is associated with an identifier (here milk, soy or peanut) and a value (here 25, 30 or 0). In these embodiments, each of the data items 8 includes an identifier and a value. The identifier is, in this example, the name or label of a data item (eg, “milk”) that can be used to generate textual information 26 (eg, see step 504 in FIG. 5). The system uses this identifier to specify the specified (a) language (if the generated information is human readable), or (b) instruction format (if the generated information is a machine instruction)
Contains rules that can be translated and / or expressed. This allows the information to be generated to be generated in a language / format appropriate to the situation or as otherwise determined by rules.

名称又はラベル「非常に高い食品アレルゲン」を有する集約データ項目２４には、例えば、以下のようなルール
ミルク＞２５なら、ミルクを「非常に高い食品アレルゲン」に含めるＡＮＤ
大豆＞２５なら、大豆を「非常に高い食品アレルゲン」に含めるＡＮＤ
ピーナッツ＞２５なら、ピーナッツを「非常に高い食品アレルゲン」に含める
によって、上記のデータ項目８からポピュレート（例えば、図５の工程５０２を参照）することができる。 The aggregated data item 24 with the name or label “very high food allergen” includes, for example, if the rule milk> 25, the milk is included in the “very high food allergen” AND
If soybean> 25, include soybean in "very high food allergen" AND
If peanuts> 25, peanuts can be populated from data item 8 above by including them in a “very high food allergen” (see, for example, step 502 in FIG. 5).

この他に、名称又はラベル「非常に高い食品アレルゲン」を有する集約データ項目２４（図４及び図６）には、例えば以下のようなデータ前処理操作（例えば、図７の工程７０２を参照）
「非常に高い食品アレルゲン」は、範囲（２５，１００）内の食品アレルゲンである
を適用することによって、上記のデータ項目８からポピュレートすることができる。 In addition to this, the aggregated data item 24 (FIGS. 4 and 6) having the name or label “very high food allergen” includes, for example, the following data preprocessing operation (see, for example, step 702 in FIG. 7).
“Very high food allergens” can be populated from data item 8 above by applying food allergens in the range (25,100).

複数のデータ項目からの情報の生成は、
（ｄ）プリプロセッサ（前処理装置）であって、
ｉ．複数のデータ項目のうちの少なくとも一つを用いて集約データ項目をポピュレートし、かつ
ｉｉ．これら複数のデータ項目から一つ以上の他の派生属性を構成する
プリプロセッサと、
（ｅ）この集約データ項目を使用する派生属性及び他の派生属性を使用して、情報を生成する情報生成器と
を含むことが可能であり、
情報生成器が、決定支援システムの少なくとも一部を成し、
集約データ項目が、派生属性の一形式であり、
一つの派生属性は、前記複数のデータ項目から一つ以上の上位概念を抽出して、それにより、前記情報を生成する際に、凝縮された量のより関連するデータを検討できるように、式を使用して前記複数のデータ項目から構成されるデータ項目であり、
そのように生成された情報が、以下のグループ
ｉｉｉ．テキストの情報
ｉｖ．機械命令
のうちの一つ以上に属する。 Generating information from multiple data items
(D) a preprocessor (preprocessing device),
i. Populate an aggregate data item with at least one of a plurality of data items; and ii. A preprocessor comprising one or more other derived attributes from these multiple data items;
(E) a derived attribute that uses this aggregated data item and an information generator that generates information using other derived attributes;
An information generator forms at least part of a decision support system;
The aggregate data item is a form of derived attribute,
A derived attribute is an expression that allows one or more superordinate concepts to be extracted from the plurality of data items so that the condensed amount of more relevant data can be considered when generating the information. A data item composed of the plurality of data items using
The information so generated is stored in the following group iii. Text information iv. Belongs to one or more of the machine instructions.

プロセッサ４（図４及び図６）は、一つ以上の集約データ項目（例えば２４）を使用して、上で例示されたように、一つ以上のルールの結果を評価する評価器としてもプログラムされる。テキスト情報生成器４は、上記の例では、例えば、ルールの結論に従って、レポート３３用のテキスト情報を生成する。 The processor 4 (FIGS. 4 and 6) is also programmed as an evaluator that evaluates the result of one or more rules, as illustrated above, using one or more aggregated data items (eg, 24). Is done. In the above example, the text information generator 4 generates the text information for the report 33 according to the rule conclusion, for example.

このように、プロセッサ４（図４及び図６）は、
（ａ）一つ以上の集約データ項目２４を個人データ項目８を用いてポピュレートする集約データ項目ポピュレータ、
（ｂ）集約データ項目２４に適用された一つ以上のルールの結論を評価する評価器、及び
（ｃ）集約データ項目２４を使用して、（例えば、テキストレポートとして、又は一つ以上の機械命令として）テキスト情報２６を生成するテキスト情報生成器
の一つ以上として機能することが可能である。 Thus, the processor 4 (FIGS. 4 and 6)
(A) an aggregated data item populator that populates one or more aggregated data items 24 using personal data items 8;
(B) an evaluator that evaluates the conclusion of one or more rules applied to the aggregated data item 24; and (c) using the aggregated data item 24 (eg, as a text report or one or more machines It can function as one or more of the text information generators that generate the text information 26 (as instructions).

プロセッサ４（図４及び図６）は、各データ項目８を検査してから集約データ項目２４に含めてもよいことが分かる。（図４及び図６に示されるような）テキストを生成するシステムの典型的な実施形態は、概説された機能を並列的又は直列的に実行する二つ以上のプロセッサ４を含んでいてもよいことも分かるだろう。 It will be appreciated that the processor 4 (FIGS. 4 and 6) may examine each data item 8 before including it in the aggregated data item 24. An exemplary embodiment of a system for generating text (as shown in FIGS. 4 and 6) may include two or more processors 4 that perform the outlined functions in parallel or serially. You will understand that.

上で概説された例示では、名称「非常に高い食品アレルゲン」を有する（図４及び図６でラベル２４を付された）集約データ項目の一つの概念表現は、
ミルク，２５
大豆，３０
となる。 In the example outlined above, one conceptual representation of an aggregate data item (labeled 24 in FIGS. 4 and 6) having the name “very high food allergen” is
Milk, 25
Soybean, 30
It becomes.

テキスト情報生成器４（図４及び図６）は、集約データ項目２６をポピュレートするデータ項目８に関連付けられた名称又はラベルをテキスト情報２６に含めるように構成してもよい。例えば、プロセッサ４に、
「非常に高い食品アレルゲン」に対して非常に高い結果が見出された
というテキスト情報を形成するように求めてもよい（例えば、図５に示される方法の工程５０４において）。 The text information generator 4 (FIGS. 4 and 6) may be configured to include in the text information 26 a name or label associated with the data item 8 that populates the aggregated data item 26. For example, the processor 4
It may be desired to form textual information that a very high result was found for a “very high food allergen” (eg, at step 504 of the method shown in FIG. 5).

同じ例を続けると、テキスト情報生成器４として機能するプロセッサ４は、
大豆及びミルクに対して非常に高い結果が見出された
というテキストを表すテキスト情報を生成することができる。 Continuing the same example, the processor 4 functioning as the text information generator 4 is:
Text information can be generated that represents the text that very high results were found for soy and milk.

テキスト情報生成器（プロセッサ）４は、大豆がミルクよりも高い値を有しており、したがって、このテキストを提示する最良の方法は、大豆が上位となるようにテキスト内において名称又はラベルを順序付けることであると決定する。また、生成器４は、この集約データ項目２４内には二つの項目しか存在しないので、「大豆」と「ミルク」の間に「及び」を配置すべきであると決定する。値が２６の蜂蜜など、集約データ項目２４内に第３の項目が存在する場合、生成すべき文法的に正しいテキストの一つが、
大豆、蜂蜜及びミルクに対して非常に高い結果が見出された
であると生成器４が決定できるようにする機械命令を生成器４は含んでいる。 The text information generator (processor) 4 has a higher value for soy than milk, so the best way to present this text is to order names or labels in the text so that soy is on top. It is determined that it is attached. Further, the generator 4 determines that “and” should be placed between “soy” and “milk” because there are only two items in this aggregated data item 24. If there is a third item in the aggregate data item 24, such as honey with a value of 26, then one of the grammatically correct texts to be generated is
Generator 4 includes machine instructions that allow generator 4 to determine that very high results have been found for soy, honey and milk.

テキスト情報生成器４は、必要な場合は、集約データ項目２４をポピュレートするデータ項目８に関連付けられた値をテキスト情報２６内に含めるように構成される。例えば、上記のテキストは、代わりに、
大豆（３０）、蜂蜜（２６）及びミルク（２５）に対して非常に高い結果が見出された
であってもよい。 The text information generator 4 is configured to include in the text information 26 values associated with the data items 8 that populate the aggregated data items 24, if necessary. For example, the above text would instead
Very high results may have been found for soy (30), honey (26) and milk (25).

上記のものは、一つの一般的に要求される順序付けの例であるが、異なる状況では他の順序付けがあってもよい。 The above is an example of one commonly required ordering, but there may be other orders in different situations.

別のそのような例は、委託医師向けの患者検査レポートを生成するために、研究所の病理学者が、患者の血液サンプルを分析した診断機器で使用された、例えば数百ものタンパク質バイオマーカーの結果を解釈しなければならないことがある場合である。そのような解釈を可能にするために、テキストを生成するシステムは、バイオマーカー結果を複数のサブグループに整理するが、その各々は、何らかの診断上の意味を有する、より上位のマーカーと考えることができる。例えば、バイオマーカーの一つのグループは、白血病の特定のＢＣＣ型の有無を検査してもよいし、別のグループは、白血病の特定のＡＭＬ型の有無を検査してもよい。 Another such example is the use of lab pathologists in diagnostic instruments that have analyzed patient blood samples, such as hundreds of protein biomarkers, to generate patient test reports for contracted physicians. This is where the results may need to be interpreted. To allow such interpretation, the system that generates the text organizes the biomarker results into multiple subgroups, each of which is considered a higher level marker that has some diagnostic significance. Can do. For example, one group of biomarkers may be tested for the presence or absence of a specific BCC type of leukemia, and another group may be tested for the presence or absence of a specific type of AML of leukemia.

これにより、テキストを生成するシステムは、各サブグループにおけるバイオマーカー結果の全てから単一の結果を導出することによって、例えば、ＢＣＣグループのマーカーを組み合わせた結果を表す単一の値と、ＡＭＬグループのマーカーを組み合わせた結果を表す単一の値を導出することによって、データ複雑性を低減する。こうなれば、患者の血液サンプルの結果は、研究所の病理学者による解釈に適することになり、病理学者は、遙かに少数だが上位のマーカーを検討しさえすればよい。 Thus, the system that generates the text derives a single result from all of the biomarker results in each subgroup, for example, a single value representing the combined result of the markers of the BCC group, and the AML group. Data complexity is reduced by deriving a single value that represents the result of combining the markers. If this happens, the results of the patient's blood sample will be suitable for interpretation by a laboratory pathologist, who only needs to consider a much smaller but higher marker.

解釈プロセスの簡略化に加えて、テキスト生成システムによって生成されて病理学者により委託医師に提供されるレポートは、個々のマーカー値ではなくマーカーのグループに対応する結果値を使用することによって簡略化される。マーカーのグループの観点から書かれたレポートは、より簡潔であり、個々のマーカー値自体の値変化に起因する変動をあまり受けない。 In addition to simplifying the interpretation process, the reports generated by the text generation system and provided to the referring physician by the pathologist are simplified by using the result values corresponding to groups of markers rather than individual marker values. The A report written from the point of view of a group of markers is more concise and less subject to variation due to changes in the value of individual marker values themselves.

マーカーをグループ化する利点は、全てのルールが個々のマーカー値を参照する必要がある場合とは異なり、エキスパートシステムが人間の専門家の解釈プロセス及びグループ値の推論に従うことができるようになるので、はるかに少数のルールしか必要とせずにエキスパートシステムを構築できることである。同様に、多様なレポートを、特定のマーカー値ではなく、マーカーのグループ及びそれらのグループ値の観点から書くことができるので、それらのレポートを、人間の専門家による定義を必要とする遙かに少数のレポート種類を用いて、依然としてエキスパートシステムによって生成することができる。 The advantage of grouping markers is that, unlike when all rules need to refer to individual marker values, the expert system will be able to follow human expert interpretation processes and group value inferences. The ability to build expert systems with far fewer rules. Similarly, various reports can be written in terms of groups of markers and their group values rather than specific marker values, so those reports need to be defined by human experts. A small number of report types can still be generated by the expert system.

一つのデータ項目を履歴的に（時間ベースで）眺める必要性から多数のデータ項目値が生じることもある。 Many data item values may arise from the need to view a data item historically (on a time basis).

例えば、心筋酵素の結果値、例えばトロポニンを監視する病理学者は、緊急対応チームに警報を出すかどうかを評価するために、過去数週間にわたる以前の全ての結果に照らして現在の結果を解釈することが必要になる可能性がある。データ量及び複雑性は、この時系列の変化率を表し、そのため、現在値に対する時系列全体の重要な特徴を要約した、新しい上位（高レベル）の結果を提供することによって低減される。その場合、病理学者は、この上位の傾向の結果に照らして、現在の結果の意義を解釈することができる。 For example, a pathologist who monitors myocardial enzyme outcome values, such as troponin, interprets the current results against all previous results over the past few weeks to assess whether to alert the emergency response team May be necessary. The amount of data and complexity represents this rate of change of the time series and is therefore reduced by providing a new top (high level) result that summarizes the key characteristics of the entire time series relative to the current value. In that case, the pathologist can interpret the significance of the current results in the light of this higher trend result.

幾つかの実施形態では、生成されるテキスト情報は、人間が読める形式のテキスト情報（すなわち、構文法的及び／又は文法的に正しいテキスト）を生成せず、代わりに、一つ以上の機械命令の形式を取るテキストを生成する。この場合、システムは、機械命令生成器を含む。機械命令は、ワークフローを制御することができる。例えば、アレルゲンが検出されなかったことを検査結果が示す場合、人間の評価者によるレポートのチェックを行わずに、機械命令は、システムに自動的にレポートを送らせることができる。この他に、機械命令は、レポートが生成される前に、保有サンプルに対する追加検査を実施させてもよいし、或いはその追加検査が実施されるように命令してもよい。 In some embodiments, the generated text information does not generate human-readable text information (ie, syntactically and / or grammatically correct text), but instead one or more machine instructions. Generate text that takes the form In this case, the system includes a machine instruction generator. Machine instructions can control the workflow. For example, if the test result indicates that no allergen has been detected, the machine instructions can cause the system to automatically send the report without checking the report by a human evaluator. In addition, the machine instructions may cause additional tests on the retained samples to be performed before the report is generated, or may be instructed to perform the additional tests.

別の実施形態では、システム１、３（図４及び図６）は、テキストレポート又は他の出力を受け取る受信機３６を含むことができる。図８を参照すると、テキストを生成する方法の他の実施形態は、ユーザ３８が、ＣＰＵに接続されたキーボード又は他の入力デバイスによって、テキストの概念表現を入力することを可能にする工程（工程８０２）を含む。概念表現は、システムによって不揮発性メモリ２内に保存される。「概念表現（ｃｏｎｃｅｐｔｕａｌｒｅｐｒｅｓｅｎｔａｔｉｏｎ）」は、原データ項目、又は集約データ項目を含む派生属性（派生データ項目）の観点からのルール条件の表現である。上記の例を使用すると、オペレータによって入力される概念表現は、
「非常に高い食品アレルゲン」に対して非常に高い結果が見出された
という「疑似テキスト」の形式を取る。 In another embodiment, the systems 1, 3 (FIGS. 4 and 6) can include a receiver 36 that receives a text report or other output. Referring to FIG. 8, another embodiment of a method for generating text allows a user 38 to enter a conceptual representation of text via a keyboard or other input device connected to the CPU. 802). The conceptual representation is stored in the non-volatile memory 2 by the system. “Conceptual representation” is an expression of a rule condition from the viewpoint of a derived attribute (derived data item) including an original data item or an aggregated data item. Using the above example, the conceptual expression entered by the operator is
It takes the form of “pseudo-text” that very high results have been found for “very high food allergens”.

この例の疑似テキストは、照合される個々の検査結果の分析に基づいた、結論／決定のコンパクトで形式張らない記述である。疑似テキストは、オペレータによって望まれる、テキストの高レベル記述を表すが、重要なこととして、システム１、３が計算するように意図された詳細を省略する。疑似テキストは、コンピュータ処理の詳細についての自然言語記述である。疑似テキストは、プログラミング言語又はスクリプト言語を使用して達成できる所望のテキストについてのより技術的な記述よりも、作成すること及び読むことが人間にとって容易である。 The pseudo-text in this example is a compact and informal description of the conclusion / decision based on an analysis of the individual test results being matched. The pseudo-text represents the high-level description of the text desired by the operator, but importantly omits the details that the systems 1, 3 are intended to calculate. Pseudotext is a natural language description of the details of computer processing. Pseudotext is easier for humans to create and read than a more technical description of the desired text that can be achieved using a programming or scripting language.

概念表現は、解釈部分を含む。このケースでは、解釈部分は
「非常に高い食品アレルゲン」
である。 The conceptual expression includes an interpretation part. In this case, the interpretation is “very high food allergen”
It is.

解釈部分は、名称「非常に高い食品アレルゲン」を有する集約データ項目に対する操作を表す。図８（工程８０２）を参照すると、テキストを生成する方法８００の一実施形態では、システム１、３において、ユーザ３８は、解釈部分を含む疑似テキストとして、テキストの概念表現を入力する。データ項目８を受け取ると、テキスト情報生成器４は、本文書の他の箇所で説明されるように、その解釈部分からテキスト情報２６を生成する（図８の工程８０４を参照）。テキスト情報生成器４は、データ項目の各々に関連付けられた一つ以上の名称又はラベルをテキスト情報２６に含めるように構成される。テキスト情報生成器４は、複数のデータ項目のための集団名（識別子）をテキスト情報２６に含めるように更に構成されてもよい。テキスト情報生成器４は、テキスト情報２６をテキストの概念表現のリテラル部分（逐語的部分）と統合するように更に構成されてもよい。この例示では、
大豆、蜂蜜及びミルクに対して非常に高い結果が見出された
となる。 The interpretation part represents an operation on an aggregate data item having the name “very high food allergen”. Referring to FIG. 8 (step 802), in one embodiment of a method 800 for generating text, in systems 1 and 3, user 38 inputs a conceptual representation of the text as pseudo-text that includes an interpretation portion. Upon receipt of data item 8, text information generator 4 generates text information 26 from its interpretation as described elsewhere in this document (see step 804 in FIG. 8). The text information generator 4 is configured to include in the text information 26 one or more names or labels associated with each of the data items. The text information generator 4 may be further configured to include a collective name (identifier) for a plurality of data items in the text information 26. The text information generator 4 may be further configured to integrate the text information 26 with a literal part (a verbatim part) of the conceptual representation of the text. In this example,
Very high results have been found for soy, honey and milk.

図４及び図６に示される実施形態では、テキスト情報生成器４は、集約データ項目２４の特性を決定するように構成される。例えば、テキスト情報生成器４は、
（ａ）集約データ項目を構成するデータ項目の数を決定すること
（ｂ）集約データ項目が空であるかどうかを決定すること
（ｃ）集約データ項目が特定のデータ項目を含むかどうかを決定すること
のうちの一つ以上を含むように構成してもよい。 In the embodiment shown in FIGS. 4 and 6, the text information generator 4 is configured to determine the characteristics of the aggregated data item 24. For example, the text information generator 4
(A) Determining the number of data items that make up an aggregate data item (b) Determining whether the aggregate data item is empty (c) Determining whether the aggregate data item contains a specific data item You may comprise so that one or more of things to do may be included.

これらは、テキストを生成する方法及びシステムの実施形態において集約データ項目に施す操作の例である。例えば、テキスト情報２６は、
（非常に高い食品アレルゲンの数）個の食品アレルゲンに対して非常に高い結果が見出された
などの疑似テキストから生成されて（図８の工程８０４）、
３個の食品アレルゲンに対して非常に高い結果が見出された
となる。 These are examples of operations performed on aggregated data items in embodiments of methods and systems for generating text. For example, the text information 26 is
Generated from pseudo-text such as (very high number of food allergens) found very high results for one food allergen (step 804 in FIG. 8),
Very high results have been found for three food allergens.

このように、テキスト情報生成器４は、集約データ項目の決定された特性についての情報をテキスト情報２６に含めるように構成される。「Ｎｕｍｂｅｒｏｆ」（「の数」）は、集約データ項目「非常に高い食品アレルゲン」に対して行われる操作の一種である。 Thus, the text information generator 4 is configured to include in the text information 26 information about the determined characteristics of the aggregate data item. “Number of” (“number of”) is a type of operation performed on the aggregated data item “very high food allergen”.

集約データポピュレータ４（図４及び図６）は、集約データ項目２４を一つ以上の他の集約データ項目を用いてポピュレートするように構成してもよい。前者の集約データ項目は、関連のある複数のデータ項目、例えば患者が高いアレルギーを示すことが見出された全ての食品を含んでいてもよい。したがって、集約データ項目「食品」は、それもまた集約データ項目（例えば、ピーナッツ、ツリーナッツ。ツリーナッツもまた、アーモンド、ブラジルナッツ、クルミ、ヘーゼルナッツ、マカダミア、ピスタチオ、ピーカン、及びカシューナッツ等のデータ項目を包含していてもよい）であるデータ項目（例えば、ナッツ）を用いてポピュレートされてもよい。 The aggregate data populator 4 (FIGS. 4 and 6) may be configured to populate the aggregate data item 24 with one or more other aggregate data items. The former aggregated data item may include a plurality of related data items, for example, all foods that the patient has been found to exhibit high allergies. Thus, the aggregate data item “food” is also an aggregate data item (eg, peanuts, tree nuts. Tree nuts are also data items such as almonds, Brazil nuts, walnuts, hazelnuts, macadamia, pistachios, pecans, and cashews). May be populated with data items (e.g., nuts).

集約データポピュレータ４は、集約データ項目に関連付けられた集約データ項目名（識別子）をテキストに含めるように構成されてもよい。集約データポピュレータ４は、テキスト内における集約データ項目２４の名称の順序を決定するように構成されてもよい。一実施形態では、集約データ項目ポピュレータ４は、二つ以上の他の集約データ項目２４に演算を施すように構成される。例えば、一つの集約データ項目２４は、非常に高い結果の食品アレルゲンであってもよく、他の集約データ項目２４は、関心のある食品アレルゲンであってもよい。その場合、ポピュレータ４は、二つの集約データ項目の積集合を取ることによって、新しい集約データ項目２４、例えば、非常に高い結果の関心のある食品アレルゲンを生成してもよい。他の可能な演算子には、差（ｄｉｆｆｅｒｅｎｃｅ）、和（ｕｎｉｏｎ）、及び積（ｉｎｔｅｒｓｅｃｔｉｏｎ）が含まれる。別の実施形態では、集約データ項目ポピュレータ４は、別の集約データ項目を構成するどのデータ項目が特定の範囲内の値を有するかを決定するように構成される。 The aggregate data populator 4 may be configured to include the aggregate data item name (identifier) associated with the aggregate data item in the text. The aggregate data populator 4 may be configured to determine the order of the names of the aggregate data items 24 in the text. In one embodiment, the aggregate data item populator 4 is configured to perform operations on two or more other aggregate data items 24. For example, one aggregated data item 24 may be a very high outcome food allergen and the other aggregated data item 24 may be a food allergen of interest. In that case, the populator 4 may generate a new aggregated data item 24, for example a very high-resulting food allergen by taking the intersection of the two aggregated data items. Other possible operators include differences, unions, and products. In another embodiment, the aggregate data item populator 4 is configured to determine which data items that make up another aggregate data item have a value within a certain range.

一実施形態では、テキスト情報２６を生成する工程は、集約データ項目２４の決定された特性についての情報をテキスト情報２６に含める工程を含む。例えば、決定された特性が、集約データ項目を構成する項目の最大値である場合、テキスト情報は、「最も高い花粉アレルゲンは、結果＜最も高い花粉アレルゲンの値＞ｍｍｏｌ／Ｌを有する＜最も高い花粉アレルゲン＞であった」との文を含んでもよく、ここで、＜最も高い花粉アレルゲン＞は、最も高い値を有する花粉アレルゲンと定義された花粉アレルゲン集約データ項目の特性であり、＜最も高い花粉アレルゲンの値＞は、値それ自体である。 In one embodiment, generating text information 26 includes including in text information 26 information about the determined characteristics of aggregate data item 24. For example, if the determined characteristic is the maximum value of the items that make up the aggregated data item, the text information is “the highest pollen allergen has the result <highest pollen allergen value> mmol / L <highest Pollen allergen>, where <highest pollen allergen> is a characteristic of the pollen allergen aggregate data item defined as the pollen allergen having the highest value, <highest The value of the pollen allergen> is the value itself.

幾つかの実施形態では、テキスト情報生成器４は、プログラムフローを制御するために、集約データ項目２４に一つ以上のルールを適用するように構成される（例えば、図７の工程７０２参照）。そのようなルールに関連付けられる論理検査の一例は、
Ｉｆ適度な食品の数＞１ＡＮＤｉｆ症状の数＞１ＡＮＤ非常に高い食品の数＋食品の数＝０
である。 In some embodiments, the text information generator 4 is configured to apply one or more rules to the aggregated data item 24 to control program flow (see, eg, step 702 in FIG. 7). . An example of a logical check associated with such a rule is
If Number of moderate food> 1 AND if Number of symptoms> 1 AND Number of very high food + Number of food = 0
It is.

このようなルールに関連付けられるワークフロー動作は、病理学者へのレポートを自動的に委託医師に開示するのではなく、検査結果及びレポートを検討のための待ち行列に入れることであってもよい。 The workflow action associated with such a rule may be to queue test results and reports to the reviewer, rather than automatically disclosing reports to the pathologist to the referring physician.

集約データ項目は、ルールを構成するブール条件の評価において使用される場合、テキスト情報２６を生成するためのデータ項目として扱えることが分かる。集約データ項目２４へのポピュレートには、一つ以上の他の集約データ項目を用いて集約データ項目２４をポピュレートすることが含まれていてもよく、他の集約データ項目の各々は、名称又はラベルの形式を取る関連集約識別子を有していてもよい。集約データ項目へのポピュレート（例えば、図５の工程５０２）は、二つ以上の別の集約データ項目を組み合わせることによって（例えば、和又は積演算）、又は別の集約データ項目を構成するどのデータ項目が特定の範囲内の値を有するか（例えば、範囲［２０〜５０］内の花粉項目）を決定するなど、より汎用的な条件を適用することによって達成することができる。 It can be seen that the aggregated data item can be treated as a data item for generating the text information 26 when used in the evaluation of the Boolean conditions constituting the rule. Populating the aggregated data item 24 may include populating the aggregated data item 24 with one or more other aggregated data items, each of the other aggregated data items having a name or label It may have a related aggregate identifier that takes the form Populating an aggregated data item (eg, step 502 in FIG. 5) can be done by combining two or more other aggregated data items (eg, sum or product operations), or which data comprises another aggregated data item. This can be accomplished by applying more general conditions, such as determining whether an item has a value within a certain range (eg, a pollen item within the range [20-50]).

その場合、集約識別子（名称又はラベル）は、データ項目名をテキスト情報２６内で使用する場合とちょうど同じようにテキスト情報２６内で使用することができる。ここでも、テキスト情報２６内における集約名の順序をテキスト情報生成器４が決定してもよい。 In that case, the aggregation identifier (name or label) can be used in the text information 26 just as the data item name is used in the text information 26. Again, the text information generator 4 may determine the order of the aggregate names in the text information 26.

システム及び方法の幾つかの実施形態は、知識ベース（ナレッジベース）による解釈に先立ってデータ複雑性を低減する、新しい又は改良されたデータ前処理方法を含み、この方法は、
（ａ）個人データ項目をデータの一つ以上のサブセット（部分集合）にグループ化する工程（各サブセットグループは、集約データ項目と呼ばれる）、
（ｂ）統計値（例えば、最大値、最小値、グループサイズ、中間値、平均値、最頻値、又は他の任意の統計値）、又は各集約データ項目についての他の数値、ブール値、若しくはテキスト値（以下、「集約」値）を計算する工程、
（ｃ）集約データ項目の集まりに施される指定された演算（例えば、和や積）を更に実行して、他の集約データ項目を生成する工程（例えば、各々が特定の癌マーカーの集まりを表す集約データ項目「ＢＣＬＬ診断」、「ＡＭＬ診断」、「ＢＣＬＬ支持」、「ＡＭＬ支持」の和集合が、全ての白血病癌マーカーから成る別の集約データ項目「白血病」を表してもよい）、
（ｄ）自由形式テキストからなる値を有するデータ項目から一つ以上のデータ項目及び値を生成する工程、及び／又は、
（ｅ）一連の値に関連付けられたデータ項目から一つ以上のデータ項目及び値を生成する工程
を含む。 Some embodiments of the system and method include a new or improved data preprocessing method that reduces data complexity prior to interpretation by a knowledge base (knowledge base), the method comprising:
(A) grouping personal data items into one or more subsets (subsets) of data (each subset group is called an aggregate data item);
(B) a statistical value (eg, maximum value, minimum value, group size, median value, average value, mode value, or any other statistical value), or other numeric value for each aggregated data item, a Boolean value, Or a process for calculating a text value (hereinafter referred to as an “aggregation” value),
(C) further executing a specified operation (eg, sum or product) performed on the collection of aggregated data items to generate other aggregated data items (eg, each of which is a collection of specific cancer markers) The union of the aggregated data items “BCLL diagnosis”, “AML diagnosis”, “BCLL support”, “AML support” may represent another aggregate data item “leukemia” consisting of all leukemia cancer markers),
(D) generating one or more data items and values from data items having values consisting of free-form text, and / or
(E) generating one or more data items and values from data items associated with a series of values.

これにより、データ前処理方法の一つの態様は、個人データ項目及びそれらの値の集まりを検討し、そして、グループ化、フィルタリング、マッピング、相関付け、又は他の処理により、各々が値を有する派生属性（ＤｅｒｉｖｅｄＡｔｔｒｉｂｕｔｅｓ）（集約データ項目を含む）を生成することによって、このデータの複雑性を低減する。 Thereby, one aspect of the data preprocessing method considers a collection of personal data items and their values, and derivations each having a value by grouping, filtering, mapping, correlation, or other processing. Reduce the complexity of this data by generating attributes (including aggregated data items).

データ前処理方法の別の態様は、データ項目の複雑な自由形式テキスト値を検討し、ストリングパターンマッチング及びフィルタリングの処理により、各々が値を有する、他のより単純なデータ項目を生成することによって、複雑性を低減する。 Another aspect of the data pre-processing method is by examining the complex free-form text values of the data items and generating other simpler data items, each having a value, through string pattern matching and filtering processes. Reduce complexity.

データ前処理方法の別の態様は、一連の値に関連付けられたデータ項目を検討し、フィルタリング、傾向分析（ｔｒｅｎｄａｎａｌｙｓｉｓ）、又は他の分析により、各々が値を有する、他のより単純なデータ項目を生成することによって、複雑性を低減する。この方法は、一組の原データ項目における各個別データ値又は自由形式テキスト若しくはシーケンスである複雑なデータ項目値を検討することを必要とする代わりに、単一の派生データ項目及びその値の検討も可能にする。ここで、「派生」データ項目及びその値は、前処理によって構成されたデータ項目及び値を指し、「集約」データ項目を含む。これは、解釈する必要があるデータ値の量及び複雑性を著しく低減させ、したがって、（生成されるテキストレポートにおいて後で表現される）判断又は結論に達するために必要とされるルール及び決定点の数も著しく低減する。集約データ項目及びそれらの値は、知識ベースの出力として使用することもでき、得られるレポートテキストの複雑性を大きく低減する。 Another aspect of the data pre-processing method is to examine data items associated with a set of values and filter, trend analysis, or other analysis to provide other simpler data, each having a value. Reduce complexity by generating items. This method considers a single derived data item and its value, instead of having to consider each individual data value or a complex data item value that is a free-form text or sequence in a set of original data items. Also make it possible. Here, the “derived” data item and the value thereof indicate the data item and the value configured by the preprocessing, and include the “aggregation” data item. This significantly reduces the amount and complexity of the data values that need to be interpreted, and thus the rules and decision points needed to reach a judgment or conclusion (represented later in the generated text report) Is also significantly reduced. Aggregated data items and their values can also be used as knowledge base output, greatly reducing the complexity of the resulting report text.

他の実施形態では、テキストを生成するシステム及び方法は、様々な形式で提示されるデータを解釈する手段を更に含む。自由テキストデータ項目の解釈を可能にする自由テキスト解析手段も、このデータ解釈手段に含まれる。自由テキスト解析手段は、自由テキストデータ項目を前処理する方法を実行し、その方法は、テキストデータ内の「正規表現」を、以下のグループ
（ａ）長大な自由テキストデータ項目を解釈することを必要とする代わりに、データ項目の著しく単純な「標準的（ｃａｎｏｎｉｃａｌ）」表現を検討することを可能にする一連のキーワード
のうちの一つ以上にマッピングする工程と、
（ｂ）複雑なテキストデータ項目を多数のより単純な「原子的」データ項目に割り当てる工程であって、各原子的データ項目の値が、
ｉ．ブール値（例えば、真又は偽、はい又はいいえ）、
ｉｉ．有限の列挙（「ａ」、「ｂ」、「ｃ」）、又は
ｉｉｉ．数値
のうちの一つである、工程と
を含む。 In other embodiments, the system and method for generating text further includes means for interpreting data presented in various formats. Free text analysis means that allows interpretation of free text data items is also included in the data interpretation means. The free text analysis means executes a method of preprocessing free text data items, and the method is to interpret a “regular expression” in the text data as follows: Mapping to one or more of a series of keywords that allow to consider a remarkably simple "canonical" representation of the data item instead of requiring it;
(B) assigning complex text data items to a number of simpler “atomic” data items, wherein the value of each atomic data item is
i. Boolean value (eg true or false, yes or no),
ii. Finite enumeration ("a", "b", "c"), or iii. Process, which is one of the numerical values.

本明細書で説明されるような複雑なデータ項目の前処理のための新しい又は改良された方法を提供することによって、好ましい実施形態は、従来のエキスパートシステムの制約の少なくとも幾つかを克服し、大量の複雑なデータ（異なるソースから取得され、自由形式テキストを含む様々な形式で提示される数値データ及びテキストデータを含む）の解釈を可能にする。好ましい実施形態は、複雑なデータを知識又は判断（解釈されたデータに基づいた結論、結果、又は他の発見を含む）に変換する。知識又は判断は、テキストレポート内でテキスト情報（機械命令を含む）として表現される。 By providing a new or improved method for preprocessing complex data items as described herein, the preferred embodiment overcomes at least some of the limitations of conventional expert systems, Allows the interpretation of large amounts of complex data (including numerical and textual data obtained from different sources and presented in various formats including free-form text). Preferred embodiments convert complex data into knowledge or judgment (including conclusions, results, or other findings based on interpreted data). Knowledge or judgment is expressed in the text report as text information (including machine instructions).

データ前処理方法は、フィルタリング、グループ化、マッピング、及び他の操作によって、データ複雑性を扱いやすい程度まで低減する。例えば、解釈すべき数百のタンパク質バイオマーカー検査値が存在する場合、フィルタリング操作は、特定の患者に関連しないある種の結果を除外してもよい。この方法は、一つ以上のデータ項目を取り、ある式を適用して、それらの（一つ以上の）データ項目を処理し、派生属性にする工程も含む。派生属性は、より扱いやすい。というのも、派生属性は、原データ項目からより上位（高レベル）でより重要な情報を抽出し、したがって、解釈すべきデータを減少させて、扱いやすくするからである。データ前処理方法は、関連する複数のデータ項目を一つ以上のデータサブセットにグループ化するグループ化操作を含む。ここで、各サブセットグループは、集約データ項目と呼ばれる。現在の例を続けると、グループ化操作は、関連バイオマーカーの特定のサブセットの値を収集し、各サブセットについて、例えば最大値などの統計値を計算することができる。そのため、テキストを生成する方法及びシステムは、個々のバイオマーカーを解釈しなければならない代わりに、各グループについて単一のデータ値を検討しさえすればよく、検討すべきデータ値の数を著しく低減する。 Data pre-processing methods reduce data complexity to a manageable degree by filtering, grouping, mapping, and other operations. For example, if there are hundreds of protein biomarker test values to be interpreted, the filtering operation may exclude certain results that are not associated with a particular patient. The method also includes taking one or more data items and applying an expression to process those (one or more) data items into derived attributes. Derived attributes are easier to handle. This is because derived attributes extract higher-level (higher-level) more important information from the original data item, thus reducing the data to be interpreted and making it easier to handle. The data pre-processing method includes a grouping operation that groups related data items into one or more data subsets. Here, each subset group is called an aggregate data item. Continuing with the current example, the grouping operation can collect values for a particular subset of related biomarkers and calculate a statistical value, such as a maximum value, for each subset. Thus, text generation methods and systems need only consider a single data value for each group instead of having to interpret individual biomarkers, significantly reducing the number of data values to consider. To do.

ある患者のテキスト病歴又は他のテキストデータなど、特定のデータ項目が複雑な場合、マッピング操作が、テキスト内のパターン（「正規表現」）を探し、これらのパターンを一連のキーワードにマッピングすることができる。そのため、テキストを生成する方法及びシステムは、長大な自由テキストデータ項目を解釈しなければならない代わりに、このテキスト項目の著しく単純な「標準的」表現を考えさえすればよい。病歴の多数のバリエーションが、同じ単純な標準的表現をもたらすことがあり、やはり、著しく少数のルール及び決定点を使用して解釈を行うことを可能にする、より容易な解釈を可能にする。 When certain data items are complex, such as a patient's text history or other text data, the mapping operation may look for patterns in the text (“regular expressions”) and map these patterns to a set of keywords. it can. Thus, instead of having to interpret a large free text data item, the method and system for generating text need only consider a remarkably simple “standard” representation of this text item. Numerous variations in medical history can result in the same simple standard representation, again allowing easier interpretation that allows interpretation using significantly fewer rules and decision points.

マッピングの別の例は、テキストのパターンをキーワードに割り当てる代わりに、複雑なテキストデータ項目を、多数のより単純な「原子的」データ項目、すなわち、各原子的データ項目の値がブール値（「真」（ｔｒｕｅ）若しくは「偽」（ｆａｌｓｅ）、はい若しくはいいえ）、有限な列挙（「ａ」、「ｂ」、「ｃ」）、又は数値であるデータ項目に割り当てる。複雑な病歴から割り当てられる原子的データ項目の一例は、「真」又は「偽」の値を有する「糖尿病ステータス」と呼ばれるデータ項目であってもよい。別の例は、列挙値「ビグアニド」、「メグリチニド」、又は「スルホニル尿素」を有する「糖尿病薬」と呼ばれるデータ項目であってもよい。このようにして、病歴内に含まれる選択された重要な概念が抽出され、別の標準的な方式で表現される。 Another example of a mapping is that instead of assigning a text pattern to a keyword, a complex text data item is converted to a number of simpler “atomic” data items, ie the value of each atomic data item is a Boolean value (“ Assign to data items that are true (true) or "false" (yes or no), finite enumeration ("a", "b", "c"), or numeric. An example of an atomic data item assigned from a complex medical history may be a data item called “diabetes status” having a value of “true” or “false”. Another example may be a data item called “diabetic drug” having the enumerated values “biguanide”, “meglitinide”, or “sulfonylurea”. In this way, selected important concepts included in the medical history are extracted and expressed in another standard manner.

これらの全ての例では、複雑なデータが、解釈を容易にするために、より単純なデータ項目へと前処理される。 In all these examples, complex data is preprocessed into simpler data items to facilitate interpretation.

本発明の一実施形態では、集約データポピュレータ装置又はツール（例えば、データベース構造）が、複数のデータ項目を受け取る。ここで、各データ項目は、例えば、複数の検査のうちの一つの結果に対応する。この例示では、複数の検査の結果は、
（ａ）患者の状態の調査（例えば、患者が特定の形態の疾病又はアレルギーを有するか）
（ｂ）大量のデータの監査（例えば、航空券を再発行すべきかどうかを判断する場合に必要とされるもの）、又は
（ｃ）情報を抽出し、或いは決定に到達するために、大量の複雑なデータ項目（テキストレポート内の列挙データ及び数値データを含む）を解析する必要がある基本的に任意の解析
において使用される。 In one embodiment of the invention, an aggregate data populator device or tool (eg, a database structure) receives a plurality of data items. Here, each data item corresponds to one result of a plurality of examinations, for example. In this example, the results of multiple tests are
(A) Investigation of the patient's condition (eg, does the patient have a particular form of disease or allergy)
(B) an audit of a large amount of data (eg, what is required when determining whether a ticket should be reissued), or (c) a large amount of information to extract or reach a decision Used in essentially any analysis where complex data items (including enumerated data and numeric data in text reports) need to be analyzed.

図５及びタンパク質バイオマーカー検査の例を参照すると、複数のタンパク質バイオマーカー検査の検査値は、複数のサブセットデータグループ（集約データ項目）にグループ化される。言い換えると、各集約データ項目は、個々のタンパク質バイオマーカー検査値の集まりからポピュレートされる（工程５０２）。 Referring to FIG. 5 and the example of protein biomarker test, test values of a plurality of protein biomarker tests are grouped into a plurality of subset data groups (aggregated data items). In other words, each aggregated data item is populated from a collection of individual protein biomarker test values (step 502).

テキスト生成用システムのこの実施形態では、装置（集約データ項目ポピュレータ）は、様々な種類のデータ項目を（一つ以上の）適切な集約データ項目に関連付ける所定のデータ構造の形式での情報を含む。このデータ構造は、装置が、受け取ったデータを処理する様々なルールを適用することによって、受け取ったデータ項目の一つ以上を用いて所定の集約データ項目をポピュレートすることを可能にする。言い換えると、集約データ項目ポピュレータは、個人データ項目を関連する集約データ項目にマッピングすることによって、関連する集約データ項目をポピュレートする。「集約データ項目ポピュレータ」は、個人データ項目をどのようにマッピングすべきかを決定する一組のルールを含む。個人データ項目（プライマリ属性及び派生属性を含む）は、名称、種類、値によって、又は別の集合のメンバーシップによって、一つの集約データ項目にマッピングされる。言い換えると、集約データ項目は、集合メンバーシップに従って、個人データ項目を用いてポピュレートされる。現在の例では、集約データ項目のうちの一つにおける各データ項目は、例えば、特定の疾病又はアレルギーについての関連するバイオマーカーである。航空運賃表の例を使用すると、集約データ項目のうちの一つにおける各データ項目は、例えば、チケット再発行のための関連する条件とすることができる。 In this embodiment of the text generation system, the device (aggregated data item populator) includes information in the form of a predetermined data structure that associates various types of data items with the appropriate aggregated data item (s). . This data structure allows the device to populate a given aggregated data item using one or more of the received data items by applying various rules for processing the received data. In other words, the aggregate data item populator populates related aggregate data items by mapping personal data items to related aggregate data items. An “aggregated data item populator” includes a set of rules that determine how personal data items should be mapped. Personal data items (including primary and derived attributes) are mapped to one aggregated data item by name, type, value, or by membership in another set. In other words, aggregate data items are populated with personal data items according to collective membership. In the current example, each data item in one of the aggregated data items is, for example, an associated biomarker for a particular disease or allergy. Using the example airfare table, each data item in one of the aggregated data items can be, for example, an associated condition for ticket reissue.

他の実施形態（図９の工程９０２参照）では、データを前処理する工程は、自由形式テキストで表現されたデータを抽出する方法（本書において後でより詳細に説明する。工程９０２）を含む。この部分の議論のため、テキストを生成するシステム及び方法の一実施形態は、自由形式テキストを含む様々な手法で表現されたデータをテキスト要約器属性（ｔｅｘｔｃｏｎｄｅｎｓｅｒＡｔｔｒｉｂｕｔｅ）を使用して抽出する手段を含む。そのようにして抽出されたデータ項目は、その後、他のデータ項目（例えば、システムによって受け取られた、個々の検査結果に関係する数値データ項目や、レポート／記録データの個々の項目、例えばクレジットカード有効期限や航空券発券日）と同様の方法で処理される。 In another embodiment (see step 902 in FIG. 9), the step of preprocessing the data includes a method of extracting data expressed in free-form text (described in more detail later in this document; step 902). . For the purposes of this section, one embodiment of a system and method for generating text is a means for extracting data expressed in various ways, including free-form text, using text condenser attributes. including. The data items so extracted are then sent to other data items (eg numeric data items related to individual test results received by the system, individual items of report / record data, eg credit cards (Expiration date and ticketing date) are processed in the same way.

図７の工程７０２を参照すると、その後、追加の集約データ項目に、その集約データ項目に作用する他のルールによってデータを供給してもよい。追加の集約データ項目は、例えば、有意な値を有するデータ項目を含んでいてもよい。その場合、追加のルールが、追加の集約データ項目に適用される。ルールの一例は、追加の集約データ項目内の重要なデータ項目の数が閾値を超えたかどうかを判断することを含んでいてもよい。ルールの結果は、肯定的な検査結果を示してもよく、その場合、肯定的な又は他の検査結果を報告する適切なテキストが生成される。テキストは、集約データ項目を使用することにより、ケースごとにルールを必要とすることなく、柔軟にケースバイケースで生成してよい。 Referring to step 702 of FIG. 7, the additional aggregated data item may then be supplied with data by other rules that operate on the aggregated data item. The additional aggregate data item may include, for example, a data item having a significant value. In that case, additional rules are applied to the additional aggregated data items. An example of a rule may include determining whether the number of important data items in the additional aggregated data item has exceeded a threshold. The result of the rule may indicate a positive test result, in which case appropriate text is generated that reports a positive or other test result. Text may be generated flexibly on a case-by-case basis without using rules for each case by using aggregated data items.

図９を参照すると、自由形式テキストで表現されたデータ（例えば、臨床メモ、航空運賃表、不動産広告）を含むデータを異なるソースから抽出する工程（工程９０２）を含む、テキストを生成する方法の他の実施形態９００が示されている。この方法は、様々な手法で表現された関連する情報を含む自由テキストのブロックの解析を可能にする。自由テキストから抽出された情報（例えば、数値データ又は他の情報）が他のデータとともに解析され、結論又は判断に至る（工程９０４）。例えば、臨床メモは、自由テキストで表現された重要な情報を含んでいてもよく、また、病理学検査及び人口統計データと併せて解釈されなければならない。 Referring to FIG. 9, a method of generating text that includes extracting data (eg, clinical memos, airfare charts, real estate advertisements) that is expressed in free-form text from different sources (step 902). Another embodiment 900 is shown. This method allows the analysis of free text blocks containing relevant information expressed in various ways. Information extracted from the free text (eg, numeric data or other information) is analyzed along with other data to arrive at a conclusion or decision (step 904). For example, clinical notes may contain important information expressed in free text and must be interpreted in conjunction with pathological examinations and demographic data.

航空券発券環境では、自由テキストを解釈する必要性から生じる問題を解決しようとする発明者らの最初の試みは、「テキスト正規化属性」（ＴｅｘｔＮｏｒｍａｌｉｓａｔｉｏｎＡｔｔｒｉｂｕｔｅ：ＴＮＡ）と呼ばれる派生属性を生成することを含んでいた。ＴＮＡは、自由テキストを一連のキータームに変換する。「キーターム（ＫｅｙＴｅｒｍ）」は、自由テキストの断片を表す固有コードである。キータームは、可変の要素、例えば通貨値を含んでいてもよい。自由テキスト断片の幾つかの別形が単一のキータームにマッピングされてもよい。自由テキストから一連のキータームへのマッピングは、その自由テキストの標準的表現を提供する。 In the ticketing environment, the inventors' first attempt to solve the problem arising from the need to interpret free text generates a derived attribute called "Text Normalization Attribute" (TNA). It included that. TNA converts free text into a series of key terms. “Key Term” is a unique code representing a piece of free text. The key terms may include variable elements, such as currency values. Several variants of free text fragments may be mapped to a single key term. The mapping from free text to a series of key terms provides a standard representation of the free text.

ＴＮＡは、各キータームを、その多数の形態に従って、すなわち、そのキータームの別フレーズによって定義することを可能にした。派生属性の出力は、自由テキストから抽出されたキータームから成る、「凝縮された」又は「正規化された」テキストの文字列であった。図１は、典型的なテキストブロックと、ＴＮＡによって定義されたその「正規形」を表示するユーザインタフェースを示している。ＴＮＡは、図２に示されるように、本質的には、正規表現のキーワードへのマップである。 TNA allowed each key term to be defined according to its numerous forms, i.e. by another phrase of the key term. The output of the derived attribute was a string of “condensed” or “normalized” text consisting of key terms extracted from free text. FIG. 1 shows a user interface displaying a typical text block and its “normal form” defined by the TNA. A TNA is essentially a map of regular expressions to keywords, as shown in FIG.

関連するキータームは表として列挙され、キーワードごとに、マッチする正規表現のリストが存在した。次に、生テキストが、現在の検索位置から（位置的に）最も近いマッチを検索することによって、トークンのリストに変換された。ここでは、同じ位置から開始する複数のマッチは、マッチ長さによって選択される。ビルトインマッチャ（ｂｕｉｌｔ−ｉｎｍａｔｃｈｅｒ）は、「ＡＵＤ７５」などの通貨値を、可変要素を有するキータームと考えることのできる特別な金銭値トークンに変換する。 Related key terms were listed as a table, and for each keyword there was a list of matching regular expressions. The raw text was then converted into a list of tokens by searching for the (positionally) closest match from the current search position. Here, multiple matches starting from the same position are selected according to the match length. A built-in matcher converts a currency value, such as “AUD 75”, into a special monetary value token that can be considered a key term with variable elements.

正規化されたテキストが、（一つ以上の）所望の値、例えば、取引の金銭値（７５）及び通貨の「値」（ＡＵＤ）を抽出するために解析された。発明者らによって行われた実験において所望の値を抽出するのに使用された構文法（シンタックス）は、テキスト正規表現パターンマッチングアルゴリズムを使用する独自仕様のリップルダウン条件言語における構文法であった。図３は、正規化テキストから通貨及び値を抽出するために使用された埋め込み変数を有するコメントの二つの例を表示するユーザインタフェース画面を示している。 The normalized text was analyzed to extract the desired value (one or more), for example, the monetary value (75) of the transaction and the “value” (AUD) of the currency. The syntax used in the experiment conducted by the inventors to extract the desired value was the syntax in a proprietary ripple-down condition language that uses a text regular expression pattern matching algorithm. . FIG. 3 shows a user interface screen displaying two examples of comments with embedded variables used to extract currency and value from normalized text.

ＴＮＡは、知識ベースを構築することによって試されたが、その知識ベースでは、コメントは、所定の理由で航空券を再発行する費用を与える可変表現であった。ほとんど全ての場合において、
金額＝ＮｏｒｍＣａｔにおいて｛「ＣＸＢＴＦＯＲＭＶ＄」にマッチするコードにおける金額｝通貨＝｛ＮｏｒｍＣａｔにおいて「ＣＸＢＴＦＯＲＭＶ＄」にマッチするコードにおける通貨｝
のようなコメントを追加するための条件は、
ＮｏｒｍＣａｔがコードシーケンス「ＣＸＢＴＦＯＲＭＶ＄」を含む
であった。 TNA was tried by building a knowledge base, where comments were a variable expression that gave the cost of reissuing a ticket for a given reason. In almost all cases,
Amount = {Amount in code matching "CX BT FOR MV $" in NormCat} Currency = {Currency in code matching "CX BT FOR MV $" in NormCat}
The conditions for adding a comment like
NormCat contained the code sequence “CX BT FOR MV $”.

本質的には、同じマッチングシーケンスが３回、すなわち、可変コメントにおいて２回、それを追加するための条件において１回、書かれなければならなかった。 In essence, the same matching sequence had to be written three times, ie twice in the variable comment and once in the condition to add it.

このテキスト正規化プロセスを使用して、発明者らは、例えばオーストラリアなど一つの国で調査した運賃表のほとんどをうまく解析する知識ベースを構築することができたが、最も複雑な運賃表からデータを抽出するには、いくらかの機能強化が必要であった。しかし、別の国の運賃表のために新しいキーワード又はキータームを追加する必要があった場合は特に、知識ベースを維持することが困難であることを意味する問題が存在した。類似の問題は、他の状況でも、例えば、ある患者の検査結果の解釈に２人以上の臨床医からの臨床メモを含める必要がある場合にも生じた。 Using this text normalization process, the inventors were able to build a knowledge base that successfully analyzed most of the fare schedules studied in one country, such as Australia, but data from the most complex fare schedules Some enhancements were needed to extract the. However, there was a problem that meant it was difficult to maintain a knowledge base, especially when new keywords or key terms had to be added for another country's fare schedule. Similar problems have arisen in other situations, for example, when the interpretation of a patient's test results requires the inclusion of clinical notes from two or more clinicians.

ＴＮＡに伴う問題は、以下のように説明することができる。 The problem with TNA can be explained as follows.

Ａ．抽出情報の変更に対する感度
ＴＮＡに新しいキーワードを追加した結果、コメント内の変数及びルール内の条件が、もはや意図したようには評価されない可能性があった。例えば、運賃表が、テキスト
．．．ＢＥＦＯＲＥＤＥＰＡＲＴＵＲＥＢＵＴＷＩＴＨＩＮ２４ＨＯＵＲＳＯＦＳＣＨＥＤＵＬＥＤＦＬＩＧＨＴＴＩＭＥＣＨＡＲＧＥＡＵＤ７５ＦＯＲＣＡＮＣＥＬＬＡＴＩＯＮ．．．（出発前であって予定出発時刻の２４時間以内の場合、キャンセルには７５オーストラリアドルを請求します）
を含んでいたと仮定する。 A. Sensitivity to changes in extracted information As a result of adding new keywords to the TNA, variables in comments and conditions in rules may no longer be evaluated as intended. For example, if the fare table is text. . . BEFORE DEPARTURE BUT WITHIN 24 HOURS OF SCHEDULED LED TIME CHARGE AUD 75 FOR CANCELATION. . . (If you are not departing and within 24 hours of your scheduled departure time, you will be charged AUD 75 for cancellation)
Is included.

このテキストは、キーワード「ＢＥＦＯＲＥＤＥＰＡＲＴＵＲＥ」、「ＣＡＮＣＥＬＬＡＴＩＯＮ」、及び「ＦＯＲ」を含む。これらのキーワードは、図２に列挙されたキータームの同義語（別形又は正規代替表現）である。ＴＮＡは、正規表現を、関連するキーワードにマッピングする。 This text includes the keywords “BEFORE DEPARTURE”, “CANCELATION”, and “FOR”. These keywords are synonyms (alternate or regular alternative expressions) of the key terms listed in FIG. TNA maps regular expressions to related keywords.

ＴＮＡが、「ＢＥＦＯＲＥＤＥＰＡＲＴＵＲＥ」を「ＢＴ」で、「ＦＯＲ」を「ＦＯＲ」で、「ＣＡＮＣＥＬＬＡＴＩＯＮ」を「ＣＸ」で置き換え、加えて金銭値（ｍｏｎｅｔａｒｙｖａｌｕｅｓ）のビルトインマッチも置き換えた場合、正規化テキスト（すなわち、凝縮されたテキストの文字列である、派生属性の出力）は、
ＢＴＭＶ＜ＡＵＤ，７５＞ＦＯＲＣＸ
となる。 If TNA replaces “BEFORE DEPARTURE” with “BT”, “FOR” with “FOR”, “CANCELATION” with “CX”, and also replaces the built-in match of monetary values The text (ie the output of the derived attribute, which is a condensed text string)
BT MV <AUD, 75> FOR CX
It becomes.

この正規化テキストは、条件
コードシーケンス「ＢＴＭＶ＄ＦＯＲＣＸ」を含む
を満たす。 This normalized text satisfies the condition code sequence “BT MV $ FOR CX”.

ここで、語句「ＷＩＴＨＩＮ２４ＨＯＵＲＳＯＦＳＣＨＥＤＵＬＥＤＦＬＩＧＨＴＴＩＭＥ」（予定出発時刻の２４時間以内）が重要であり、捕捉する必要があると我々が決めた場合、我々はこれのための新しいキーワード、例えば「Ｗ２４ＨＦＴ」を追加しなければならない。我々の正規化テキストは、今度は、
ＢＴＷ２４ＨＦＴＭＶ＜ＡＵＤ，７５＞ＦＯＲＣＸ
となる。 Here, if we decide that the phrase “WITHIN 24 HOURS OF SCHEDULED LIGHT TIME” (within 24 hours of scheduled departure time) is important and needs to be captured, we have new keywords for this, for example “ "W24HFT" must be added. Our normalized text is now
BT W24HFT MV <AUD, 75> FOR CX
It becomes.

しかし、新しい正規化テキストは、コードシーケンス内に「Ｗ２４ＨＦＴ」が存在するので、もはや元の条件を満たさない。すなわち、新しいキータームの追加は、意図していたものとは異なる評価をＴＮＡに容易に行わせうる。 However, the new normalized text no longer satisfies the original condition because “W24HFT” is present in the code sequence. That is, the addition of a new key term can easily cause the TNA to perform a different evaluation than intended.

キータームがテキスト正規化処理から除去される場合にも、全く同じ問題が生じる。 The exact same problem occurs when key terms are removed from the text normalization process.

Ｂ．コメント及び条件の冗長性
上記の例で概説したように、正規化テキストから値及び通貨を抽出するために、同じマッチングシーケンスを３回使用しなければならなかった。これは、処理時間、並びにコメント及び条件を作成するのにユーザが必要とする時間の面で非効率的であり、最終的には、知識ベースの保守を必要以上に困難にした。 B. Comment and Condition Redundancy As outlined in the example above, the same matching sequence had to be used three times to extract values and currency from normalized text. This was inefficient in terms of processing time, as well as the time required by the user to create comments and conditions, and ultimately made knowledge base maintenance more difficult than necessary.

Ｃ．キーワード名変更に対する感度
例えば、「ＢＴ」から「ＢｅｆｏｒｅＴｒａｖｅｌ」（出発前）にキーワードを変更すると決定した場合、やはり、このキーワードを使用していた変数及び条件は、もはや当てはまらなくなる。これは問題Ａに類似しているが、キーワード名変更は表面的な変更であるので、より容易に回避される。一方、新しいキーワードの追加又は既存のキーワードの削除は、テキスト正規化プロセスにとってより根本的な変更である。 C. Sensitivity to keyword name change For example, if it is decided to change a keyword from “BT” to “Before Travel” (before departure), the variables and conditions that used this keyword are no longer applicable. This is similar to problem A, but the keyword name change is a cosmetic change and is more easily avoided. On the other hand, adding new keywords or deleting existing keywords is a more fundamental change to the text normalization process.

このように、自由形式テキスト内のデータを前処理する問題を解決しようとするこれまでの試みは、変更されたキーワードに対処できない欠点と、コメント及び条件を定義する際の非効率性に悩まされていた。この制約は、上で概説された航空券発券の例及びログファイルの例の状況において問題に対処しようと試みる際に見つかった。 Thus, previous attempts to solve the problem of pre-processing data in free-form text suffer from the inability to deal with modified keywords and inefficiencies in defining comments and conditions. It was. This limitation was found when attempting to address the problem in the context of the ticket issue example and log file example outlined above.

今からＩＴサポートサービスの例を取り上げて、以下のログファイル断片について検討する。 Now consider the following log file fragment, taking an example of an IT support service.

ＴＮＡを使用するテキスト正規化処理は、これらのログエントリをフィルタリングして、
ＰＭＤＣ
に縮小する。ここで、第１のログエントリは「ＰＭ」と、第３のものは「ＤＣ」（ＷＡＲＮＩＮＧＣｏｕｌｄｎｏｔｄｉｓｃｏｎｎｅｃｔｃｌｉｅｎｔ（警告はクライアントを切断しない））と符号化され、第２及び第４の（情報提供用の）エントリは無視される。 A text normalization process that uses TNA filters these log entries,
PM DC
Reduce to. Here, the first log entry is encoded as “PM” and the third is encoded as “DC” (WARNING Cold not disconnect client (warning does not disconnect the client)), and the second and fourth (information Entries (for provisioning) are ignored.

偽陽性の（すなわち有意ではない）ＤＣ警報を示すルールは、条件
コードシーケンス「ＰＭＤＣ」を含む
を使用してもよい。 A rule indicating a false positive (ie, insignificant) DC alarm may use the condition code sequence “PM DC” included.

しかし、ここで、ＴＮＡが、バックアップ（ＢＣＫ）イベントなど、新しい項目を含むように変更された場合、得られる正規化テキストは、
ＰＭＢＣＫＤＣ
となる。 However, if the TNA is changed to include a new item, such as a backup (BCK) event, the resulting normalized text is
PM BCK DC
It becomes.

すると、ＤＣ警報が偽陽性であったことを示す条件は、もはや正しく評価されない。 The condition indicating that the DC alarm was a false positive is then no longer correctly evaluated.

したがって、先の航空券発券の例で説明したものと同じＴＮＡの制約が、ここでも当てはまる。 Therefore, the same TNA restrictions as described in the previous ticketing example apply here as well.

図９の実施形態は、キーターム（ＫｅｙＴｅｒｍｓ）とキーコンセプト（ＫｅｙＣｏｎｃｅｐｔｓ）の双方を組み込んだ新しいツール（「テキスト要約器属性（ＴｅｘｔＣｏｎｄｅｎｓｅｒＡｔｔｒｉｂｕｔｅ）」又はＴＣＡとして知られる）を提供する。キータームとキーコンセプトの双方を単一のツールにまとめることによって、キーワードの追加又は削除によって引き起こされる問題は克服される。また、キーワードは、タームとコンセプトの両方で共有されるオブジェクトなので、集約データ項目に適用（例えば、図７の工程７０２）されるルールに影響を及ぼすことなく、名称を変更することができる。更に、ツールは、派生属性自体としてのキーコンセプトの抽出を含んでいるので、条件及び変数を反復する必要はあまりない。 The embodiment of FIG. 9 provides a new tool (known as “Text Condenser Attribute” or TCA) that incorporates both Key Terms and Key Concepts. By combining both key terms and key concepts into a single tool, the problems caused by adding or deleting keywords are overcome. Since the keyword is an object shared by both the term and the concept, the name can be changed without affecting the rule applied to the aggregated data item (for example, step 702 in FIG. 7). Furthermore, since the tool includes the extraction of the key concept as the derived attribute itself, there is less need to iterate the conditions and variables.

図１０は、あるＴＣＡの例示のユーザインタフェースを示している。「属性（Ａｔｔｒｉｂｕｔｅ）」又は「プライマリ属性（ＰｒｉｍａｒｙＡｔｔｒｉｂｕｔｅ）」は、ルール条件又は他の表現の基本要素の一つである。各属性は、名称と、関連付けられた一つの値、又は場合によっては一連の値（時系列の複数の値がその属性に関連付けられる場合など）を有する。属性は、ルール条件のデータ値要素、例えば、単一のアレルゲンマーカーなどの下位（低レベル）のデータ項目や、花粉項目などのより上位（高レベル）の集約データ項目を表す。ルール条件の他の要素は、算術演算子やテキスト演算子や論理演算子、あるいはブール式を形成するために属性とその値を関連付けるその他の表現である。例えば、ルール条件「幾つかの花粉が高い」は属性「花粉」（集約データ項目）と、論理表現「幾つかのＸが高い」を含み、ここでは、変数「Ｘ」が花粉という値で置き換えられている。 FIG. 10 shows an exemplary user interface of a TCA. “Attribute” or “Primary Attribute” is one of the basic elements of rule conditions or other expressions. Each attribute has a name and an associated value or possibly a series of values (such as when multiple values in time series are associated with the attribute). The attribute represents a data value element of the rule condition, for example, a lower (low level) data item such as a single allergen marker, or a higher (high level) aggregate data item such as a pollen item. Other elements of the rule condition are arithmetic, text, logical, or other expressions that associate an attribute with its value to form a Boolean expression. For example, the rule condition “some pollen is high” includes the attribute “pollen” (aggregated data item) and the logical expression “some X is high”, where the variable “X” is replaced with the value pollen. It has been.

「ケース」は、解釈のためにエキスパートシステムに提示される、属性とその値の集まりである。プリプロセッサは、複雑なケース、すなわち、多数の属性、大量の自由形式テキストデータを伴う属性、又はデータ項目の長いシーケンスを伴う属性を有するケースを取り、集約データ項目（より上位の属性、又は「派生」属性）をケースに追加することによって、そのケースの複雑性を低減させ、ケースをルール条件及び解釈レポート内で、より容易に、かつより汎用的に使用できるようにする。 A “case” is a collection of attributes and their values that are presented to the expert system for interpretation. The preprocessor takes complex cases, ie cases with a large number of attributes, attributes with a large amount of free-form text data, or attributes with a long sequence of data items, and aggregate data items (higher attributes, or “derived” "Attributes" are added to a case to reduce the complexity of the case and make it easier and more generic to use in rule conditions and interpretation reports.

テキスト要約器属性（ＴＣＡ）は、このような派生属性である。テキスト要約器属性は、一組のキーワード（又は「キーターム」）を、一組のキーコンセプト又は「派生マッチ（ＤｅｒｉｖｅｄＭａｔｃｈｅｓ）」とともに定義する（図１１参照）。各キーコンセプト又は派生マッチは、
（ａ）ターゲット（実際には、知識ベース又はエキスパートシステムにおける別の派生属性である）
（ｂ）抽出式（マッチした形式について派生属性の値を定める）
（ｃ）「照合用形式（ＭａｔｃｈｉｎｇＦｏｒｍｓ）」のリスト（照合用形式は、一連のキーターム（キータームのシーケンス）である）
から成る。 The text summarizer attribute (TCA) is such a derived attribute. The Text Summarizer attribute defines a set of keywords (or “key terms”) along with a set of key concepts or “Derived Matches” (see FIG. 11). Each key concept or derivative match is
(A) Target (actually another derived attribute in the knowledge base or expert system)
(B) Extraction formula (determines the value of the derived attribute for the matched format)
(C) List of “Matching Forms” (matching format is a series of key terms (key term sequence))
Consists of.

本実施形態は、以下のようにして、テキストのブロック上でＴＣＡの評価を実行する。すなわち、
（ａ）テキストが一連のキーワード（キーワードのシーケンス）に正規化される。
（ｂ）正規化テキストが派生マッチの各々によって解析されて、キーコンセプトに値を提供する。各派生マッチについて、マッチする複数の照合用形式（もしあれば）のうち最長のものが取り上げられる。これは、文献において、「貪欲（ｇｒｅｅｄｙ）」パターンマッチとして知られている。
（ｃ）一つの照合用形式がマッチを見出した各派生マッチについて、その関連する派生マッチのための所定の式が適用され、これが、そのキーコンセプトに対応する派生属性のための属性値になる。 The present embodiment performs TCA evaluation on a block of text as follows. That is,
(A) The text is normalized to a series of keywords (sequence of keywords).
(B) The normalized text is parsed by each of the derived matches to provide a value for the key concept. For each derived match, the longest of the matching formats (if any) is picked. This is known in the literature as a “greedy” pattern match.
(C) For each derived match for which a matching form finds a match, a predetermined expression for its associated derived match is applied, which becomes the attribute value for the derived attribute corresponding to that key concept. .

関連する式の全てが「＄（１）」である航空運賃表を解析する例について検討する。これは、システム１、３（図４及び図６参照）によって、「照合したテキスト内で見出された最初の金銭値トークンを返す」として解釈される。後で別の抽出式を見る。 Consider an example of analyzing an air fare table in which all related expressions are “$ (1)”. This is interpreted by the systems 1, 3 (see FIGS. 4 and 6) as “return the first monetary token found in the matched text”. We will see another extraction formula later.

上記のプロセスは、ケース内で参照された属性（例えば、上記のチケット再発行の例における「カテゴリ」）のためのサンプルシーケンス内のサンプルの全てにわたって適用することができる。「サンプルシーケンス」は、任意の属性用の複数の値からなる、順序付けられ、刻時されたリストである。サンプルシーケンス内の各値は、日付及び時間に関連付けられる。このようにして、ＴＣＡは、ＴＣＡのための、また付随する派生属性の各々のためのサンプルシーケンスを生成する。少なくとも一つの非空白値を含むサンプルシーケンスが、ケースに導入される。図１２は、属性「カテゴリ」（Ｃａｔｅｇｏｒｙ）のための値を有する例示的なケースを示しており、その場合、ＴＣＡのための値は、「ＴＣＡ」と呼ばれ、派生属性のための値は、「ＣｘＢｔ」、「ＣｘＡｔ」、「ＲｉＯｂ１」、及び「ＲｉＲｔｃ」と呼ばれる。 The above process can be applied across all of the samples in the sample sequence for the attribute referenced in the case (eg, “category” in the ticket reissue example above). A “sample sequence” is an ordered and timed list of values for a given attribute. Each value in the sample sequence is associated with a date and time. In this way, the TCA generates a sample sequence for the TCA and for each of the associated derived attributes. A sample sequence containing at least one non-blank value is introduced into the case. FIG. 12 shows an exemplary case with a value for the attribute “Category”, where the value for TCA is called “TCA” and the value for the derived attribute is , “CxBt”, “CxAt”, “RiOb1”, and “RiRtc”.

ＴＣＡの使用は、先に説明したＴＮＡの使用に伴う問題を
（ａ）キーワードを安全に追加及び削除することを可能にし、
（ｂ）コメント及び条件内における冗長性を低減させ、
（ｃ）キーワードの名称変更（リネーム）を可能にする
ことによって解消する。 The use of TCA allows the problems associated with the use of TNA described above to: (a) allow keywords to be added and deleted safely;
(B) reduce redundancy in comments and conditions;
(C) The problem is solved by enabling the renaming of keywords.

Ａ．キーワードを安全に追加及び削除できる
派生マッチ（すなわちキーコンセプト）における照合用形式を定義する際、ユーザは、幾らかの例示の生テキストが各照合用形式に付随するように、照合されるべき生テキストの例を提供するように促される（図１３参照）。ユーザが提供する例は、照合用形式へのマッチを与えるものでなければならない。ユーザが、正規化された例がもはや照合用形式にマッチしないように（例えば、キーワードを追加することによって）キーワードに変更を加えた場合、ユーザは、（例えば、異なる色で示された派生マッチによって、又はユーザに警告を与える他の何らかの手段によって）これについて警告を受ける。 A. Keywords can be safely added and deleted When defining matching formats in derived matches (ie, key concepts), the user is able to match the raw text to be matched so that some example raw text accompanies each matching format. You are prompted to provide an example text (see FIG. 13). The example provided by the user must give a match to the matching format. If the user makes a change to a keyword (eg, by adding a keyword) so that the normalized example no longer matches the matching format, the user will (eg, a derived match indicated in a different color) Be warned about this (by or by some other means of alerting the user).

例えば、照合（マッチング）用のフレーズ「−」を有するキーワード「−」がキーワードの組に追加された場合、図１５に示されるように、派生マッチは、不具合にみまわれる。派生マッチを回復させるには、新しいキーワードを削除する必要があり、又は照合用形式の幾つかを、それらの例の正規化バージョンにマッチするように、変更する必要がある。 For example, when a keyword “-” having a phrase “-” for matching (matching) is added to a set of keywords, as shown in FIG. To recover derived matches, new keywords must be deleted, or some of the matching formats need to be changed to match the normalized versions of those examples.

このように、派生マッチにおける例は、そのキーコンセプトの定義のための背景（コンテキスト）を提供する点で、リップルダウン知識ベースにおけるコーナーストーンケース（ｃｏｒｎｅｒｓｔｏｎｅｃａｓｅ）に似ている。 Thus, the example in a derived match is similar to the cornerstone case in a ripple down knowledge base in that it provides the context for the definition of its key concept.

Ｂ．コメント及び条件の、より小さな冗長性
ＴＣＡの派生属性は、コメント及び条件内の変数において直接使用することができる。これらの条件は、ケース内における派生属性の存在をまさに主張し、例えば、
ＣｘＢｔが利用可能
は、図１６に示されるコメントを追加するために使用することができた。 B. Less verbosity of comments and conditions Derived attributes of TCA can be used directly on variables in comments and conditions. These conditions just insist on the existence of a derived attribute in the case, eg
The availability of CxBt could be used to add the comments shown in FIG.

Ｃ．キーワードの名称変更が可能
キーワードの名称はキーワードと派生マッチ内の照合用形式とによって共有されるオブジェクト参照にすぎないので、キーワードと一緒に派生マッチをＴＣＡに含めることにより、システムは、キーワードの名称変更に対して耐性を持つことになる。そのため、例えば、キーワード「ＢＴ」を「ＢｅｆｏｒｅＴｒａｖｅｌ」に名称変更した場合、我々の照合用形式は、図１７に示されるように、自動的に更新される。 C. Keyword names can be changed Since keyword names are only object references shared by keywords and matching forms in derived matches, by including the derived match in the TCA along with the keywords, the system You will be resistant to changes. Therefore, for example, when the keyword “BT” is renamed to “BeforeTravel”, our collation format is automatically updated as shown in FIG.

ＴＣＡの他の利点
異なる抽出式
派生マッチの図示の例は、運賃表の正規化テキストからの金銭値の抽出を示している。運賃表などの自由形式テキストから抽出する必要がある他の追加的な種類の情報が存在してもよい。（航空券発券シナリオにおける）例は、
（ａ）キーフレーズ（キーとなる語句）が発生したかどうか、及び／又は
（ｂ）日付
を含む。 Other benefits of TCA Different extraction formulas The illustrated example of a derived match shows the extraction of monetary values from the normalized text of a fare table. There may be other additional types of information that need to be extracted from free-form text, such as fare schedules. An example (in the ticketing scenario) is
(A) whether a key phrase (key phrase) has occurred and / or (b) date.

航空券発券の例を続けると、照合用形式が一つ以上の日付（これらは、金銭値と同様に、キーワードとして自動的に現れる）を含む場合、式「＠（ｉ）」を使用して第ｉの日付を抽出することができる。キーフレーズを扱うため、我々は、式「？」を使用して、マッチが存在する場合に派生属性が値「真」（Ｔｒｕｅ）を取るべきであることを示す。これらの式の双方を使用する（すなわち、日付とブール値を抽出する）ＴＣＡの例が、図１８に示されている。図１９は、派生属性のためのこれらのブール値及び日付値がこの例示的なケースにおいてどのように現れるかを示している。 Continuing with the example of ticketing, if the matching format includes one or more dates (these appear automatically as keywords, as well as monetary values), use the formula "@ (i)" The i th date can be extracted. To handle key phrases, we use the expression “?” To indicate that the derived attribute should take the value “True” (True) if there is a match. An example of a TCA that uses both of these equations (ie, extracts a date and a Boolean value) is shown in FIG. FIG. 19 shows how these Boolean and date values for derived attributes appear in this exemplary case.

ツールのヒント（Ｔｏｏｌｔｉｐｓ）
ユーザは、ケース内に派生属性及びその値を見た場合、それがなぜそこにあるのかよく分からないことがある。すなわち、それが生テキストのどの部分を表すのかよく分からないことがある。この点を補助するため、ある実施形態では、派生属性及びその照合値を生じさせた生テキストを、（図２０に例示されるように）ツールのヒントとして提供する。 Tool tips (Tooltips)
If a user sees a derived attribute and its value in a case, the user may not know why it is there. That is, it may not be clear what part of the raw text it represents. To assist in this regard, in one embodiment, the raw text that resulted in the derived attribute and its matching value is provided as a tool hint (as illustrated in FIG. 20).

幾つかのレポートセクション（各々が任意選択的な見出しを有する）から成る長大なレポートでは、これらのレポートセクションが提示される順序は、エンドユーザ（例えば、医師、発行チケットを検査する航空会社、不動産専門家又は買い手／売り手）にとって重要な要素である。すなわち、エンドユーザは、最も重要なレポートセクションをレポートの先頭近くで見ることを望む。しかし、一つのレポートセクションを別のセクションよりも重要にするものは、解釈される特定のケースに依存する。したがって、各ケース内のデータに作用するルールを使用して、指定されたレポートセクションを順序付けることは有益である。他の幾つかのレポートセクション、例えば、常にレポートの先頭にある要約レポートセクションは、その配置が固定されなければならない。したがって、ユーザは、固定及び可変のレポートセクション順序付けの双方を混合して定義できるようになっていてもよい。 For large reports consisting of several report sections (each with an optional heading), the order in which these report sections are presented depends on the end user (eg, physician, airline examining issued ticket, real estate) It is an important factor for professionals or buyers / sellers. That is, the end user wants to see the most important report section near the top of the report. However, what makes one report section more important than another depends on the particular case being interpreted. Therefore, it is beneficial to order specified report sections using rules that operate on the data in each case. Some other report sections, such as the summary report section that is always at the top of the report, must have a fixed placement. Thus, the user may be able to define a mix of both fixed and variable report section ordering.

アレルギーの報告は、可変レポートセクション順序付けが必要とされる可能性のある分野である。花粉（ｐｏｌｌｅｎ）、食品（ｆｏｏｄ）、ダニ（ｍｉｔｅ）、カビ（ｍｏｕｌｄ）、及び動物（ａｎｉｍａｌ）アレルゲン検査結果についてのコメントに対応して、少なくとも５つの別個のレポートセクションが存在する。与えられた患者について食品アレルギー検査結果が最も重要である場合、食品レポートセクションが他の４つの前にくるべきである、などというようになる。最も重要ではない検査結果に対応するレポートセクションは、他のものの後に配置されるべきである。更に、固定のレポートセクション、すなわち、レポートの先頭に位置する要約レポートセクション、及び通常はレポートの末尾に位置する勧告レポートセクションもある。 Allergy reporting is an area where variable report section ordering may be required. There are at least five separate report sections corresponding to comments on pollen, food, mite, mold, and animal allergen test results. If food allergy test results are most important for a given patient, the food report section should come before the other four, and so on. The report section corresponding to the least important test result should be placed after the others. In addition, there is a fixed report section, a summary report section located at the beginning of the report, and a recommendation report section usually located at the end of the report.

結果として、システムは、所望のレポートセクション順序付けに対応する値を割り当てるルール構文法（シンタックス）を使用して、各可変レポートセクションのための「派生属性」を定義する手段をオペレータ３８に提供する。上記のアレルギーの例では、例えば、「ｐｏｌｌｅｎ＿ｏｒｄｅｒ」、「ｆｏｏｄ＿ｏｒｄｅｒ」、「ｍｉｔｅ＿ｏｒｄｅｒ」、「ｍｏｕｌｄ＿ｏｒｄｅｒ」、及び「ａｎｉｍａｌ＿ｏｒｄｅｒ」の５つの派生属性が存在する。ｐｏｌｌｅｎ＿ｏｒｄｅｒは、全ての花粉データ項目のうちで最も高い値と定義され、他のものについても同様である。派生属性「ｐｏｌｌｅｎ＿ｏｒｄｅｒ」は、花粉レポートセクションに関連付けられる。各ケースについて、５つの派生属性の値が計算され、対応するレポートセクションが、これらの値に従って順序付けられる。例えば、ケースが、下記のデータ項目及び値
芝＝５０，カバノキ＝２０，（花粉）
小麦＝５，大豆＝１５，（食品）
カビ＝２
ダニ＝１
猫＝６２，犬＝４９（動物）
というデータ項目及び値を有する場合、レポートセクションは以下の順序
動物、花粉、食品、カビ、ダニ
となる。 As a result, the system provides operator 38 with a means to define a “derived attribute” for each variable report section using a rule syntax that assigns values corresponding to the desired report section ordering. . In the above example of allergy, for example, there are five derived attributes of “pollen_order”, “food_order”, “mite_order”, “mold_order”, and “animal_order”. pollen_order is defined as the highest value among all pollen data items, and the same applies to other items. The derived attribute “pollen_order” is associated with the pollen report section. For each case, the values of the five derived attributes are calculated and the corresponding report sections are ordered according to these values. For example, the case has the following data items and values: turf = 50, birch = 20, (pollen)
Wheat = 5, Soybean = 15, (Food)
Mold = 2
Tick = 1
Cat = 62, dog = 49 (animal)
The report section has the following order: animal, pollen, food, mold, tick.

幾つかの実施形態では、システムは、以下のうちの少なくとも一つを提供してもよい。すなわち、
・必要とされる非常に大きな知識ベースを管理する基礎技術としてのリップルダウン（ＲｉｐｐｌｅＤｏｗｎ）ルールシステム
・知識ベースからの符号化出力を使用して、自動検証（ａｕｔｏｖａｌｉｄａｔｉｏｎ）及び反射型試験（ｒｅｆｌｅｘｉｖｅｔｅｓｔｉｎｇ）等の研究所ワークフローを例えば制御するワークフローエンジンを制御するなどの機械語命令である符号化情報を生成する機構
・ルール条件を構築するための自然言語構文法（シンタックス）
・解釈される特定のケースによって評価されるコメントへの変数の挿入。変数は、集約データ項目を使用して定義されてもよい。 In some embodiments, the system may provide at least one of the following: That is,
・ Ripple-down rule system as basic technology to manage very large knowledge base required ・ Automatic validation and reflexive testing using encoded output from knowledge base A mechanism for generating encoded information that is machine language instructions, such as controlling a workflow engine that controls, for example, a laboratory workflow such as: Natural language syntax for constructing rule conditions (syntax)
• Insert variables into comments that are evaluated by the particular case being interpreted. Variables may be defined using aggregate data items.

推論
人間の専門家と同様に、エキスパートシステムは、ルールが適用できる可能性もできない可能性もある大きなデータ集合ではなくルールが適用できる特徴を有するデータにルールを適用することができる形式へと、複雑なデータを縮小する必要がある。したがって、ルールベースのシステムは、関連ルールを選択するプログラムを必要とする。例えば、提示された（一つ以上の）ケースの属性に一致する条件をどのルールが有するかを決定するプログラムを必要とする。 Inference Like a human expert, an expert system is not a large set of data that may or may not be applicable to a rule, but a form that can apply rules to data that has features to which the rule can be applied. Complex data needs to be reduced. Thus, rule-based systems require a program that selects the relevant rules. For example, we need a program that determines which rules have conditions that match the attributes of the presented case (s).

推論エンジン（ｉｎｆｅｒｅｎｃｅｅｎｇｉｎｅ）は、そのようなプログラムであり、適切なルールをケースに適用し、したがって、ルールを使用して「判断」などの結果を推論するタスクを実行する。 An inference engine is such a program that applies the appropriate rules to a case and thus performs tasks that use the rules to infer a result, such as a “decision”.

この推論はしばしば、幾つかの環境において適切なルールに基づいている。しかし、他の環境では、ルールは変更される必要がある。したがって、知識ベースのルールの継続的な反復が存在する。解釈のために提示されたケースへの一つ以上のルールの適用可能性は、結果の解釈の正確性を決定する。 This inference is often based on rules that are appropriate in some circumstances. However, in other environments, the rules need to be changed. Thus, there is a continuous iteration of knowledge-based rules. The applicability of one or more rules to a case presented for interpretation determines the accuracy of the interpretation of the results.

知識ベースによって生成された解釈の検査及び可能な訂正が、一組の「拒否された」ケースを決定する。これらは、元の解釈を容認する、知識ベースを保守する専門家に提示され、又はさもなければ、訂正された解釈が知識ベースによって以降のケースに与えられるように、新しいルールを生成する。 Examination of the interpretation generated by the knowledge base and possible corrections determine a set of “rejected” cases. These are presented to experts who maintain the knowledge base, accepting the original interpretation, or otherwise generate new rules so that the corrected interpretation is given to subsequent cases by the knowledge base.

人間の専門家と同様に、ルールベースのエキスパートシステムは、関連する重要な特徴から推論が可能になるように、これらの特徴に関してエキスパートシステムに提示されるデータを有する必要がある。複雑な生データから推論した場合、必要とされる特定のルールの数が管理できない程度となるばかりでなく、ルールが構築された後でも、提示された複雑なデータ内で新たに遭遇するどのようなバリエーションも解釈することができない。データ内のより一般化された特徴からの推論は、解釈プロセス自体をより一般に適用可能なものとし、そのため、より頑強（ロバスト）にする。 Like a human expert, a rule-based expert system needs to have data presented to the expert system regarding these features so that inferences can be made from the relevant important features. When inferring from complex raw data, not only is the number of specific rules needed unmanageable, but how new ones are encountered in the presented complex data even after the rules are built Cannot be interpreted. Inference from more generalized features in the data makes the interpretation process itself more generally applicable, and therefore more robust.

高トランザクション環境では、エキスパートシステムは、人間の専門知識を利用して生データの迅速な解釈を提供する際に、極めて重要な役割を果たすことが可能である。例えば、病理学研究所は、その研究所で雇用できる僅かな人数の病理学者の人的能力をはるかに超えた、１日当たり数万人の患者についての解釈レポートを提供する必要に迫られる場合がある。 In high transaction environments, expert systems can play a vital role in using human expertise to provide rapid interpretation of raw data. For example, pathology laboratories may need to provide interpretive reports on tens of thousands of patients per day, far beyond the human capacity of a small number of pathologists that can be employed at the laboratory. is there.

マルチレベル推論
マルチレベル推論は、解釈プロセスにおいて、より上位の抽象概念（上位概念と呼ばれる）を連続的に生成するプロセスである。 Multi-level reasoning Multi-level reasoning is a process of continuously generating higher level abstract concepts (called super concepts) in the interpretation process.

より上位の抽象概念
推論プロセスにおいて後で他のルールによって使用される、より上位の抽象概念を提供するために、中間結論が使用される。中間結論自体は、最終的な解釈に出現しないが、最終的な解釈を決定するために使用されるルールに影響を及ぼす。 Higher level abstractions Intermediate conclusions are used to provide higher level abstractions that are later used by other rules in the inference process. The intermediate conclusions themselves do not appear in the final interpretation, but do affect the rules used to determine the final interpretation.

好ましい実施形態では、
１．属性レベル、及び
２．ルールレベル
においてマルチレベル推論の二つの段階が存在する。 In a preferred embodiment,
1. Attribute level, and There are two stages of multilevel reasoning at the rule level.

属性レベルでのマルチレベル推論
属性は、ある式又は表現（例えば、算術式、ブール式、集合メンバーシップ演算）に従って、他の属性の点から定義されてもよい。 Multi-level inference at the attribute level Attributes may be defined in terms of other attributes according to an expression or expression (eg, arithmetic expression, Boolean expression, set membership operation).

これらは、オンライン情報システムから受け取った属性である「プライマリ」属性とは対照的に、「派生」属性と呼ばれる。 These are called “derived” attributes as opposed to “primary” attributes, which are attributes received from an online information system.

例えば、知識ベースの仕事は、全ての疑わしいトランザクションに人間の専門家向けのフラグを付けて更なる検討ができるようにし、疑わしくないトランザクションにはフラグを付けないことである。疑わしくないトランザクションにフラグを付けたのでは、人間の専門家の時間を浪費するからである。 For example, a knowledge-based task is to flag all suspicious transactions for human professionals for further consideration and not to flag non-suspicious transactions. Flagging non-suspicious transactions wastes human expert time.

知識ベースによって解析されるトランザクションの例には、以下の三つのトランザクション（取引）が含まれる。 Examples of transactions analyzed by the knowledge base include the following three transactions (transactions).

あるケースにおけるこれらの金融取引でのプライマリ属性は、「会社」、「国」、「金額」、及び「以前のフラグ」から構成されていてもよい。 The primary attributes in these financial transactions in a case may consist of “company”, “country”, “amount”, and “previous flag”.

派生属性は、得られるルールをより汎用的かつより保守可能なものにする抽象概念として定義される。例えば、以下の派生属性が定義される。
１．「石油産業」。これは、取引が石油産業の会社を含む場合に真となる。Ｂａｘｘｏｎは、石油会社の組に属する一つである。
２．「非条約国」。これは、取引が、特定の国際条約の加盟国ではない特定の一組の国々に関連する場合に真となる。Ｕｇｉｎｔａ及びＴｉｇｅｒｉａは、この組に含まれる。
３．「ボーダライン金額」。これは、取引が、それが自動的に監視下に入るように最小値に近い場合に真となる。この例の場合、９５００ドルと９９９９ドルの間の金額は、ボーダラインと考えられる。
４．「以前のボーダライン数」。これは、過去１週間以内にボーダライン内にあった取引の数の総計である。 Derived attributes are defined as abstractions that make the resulting rules more general and more maintainable. For example, the following derived attributes are defined:
1. "Oil industry". This is true if the transaction involves a company in the oil industry. Baxon is one belonging to the group of oil companies.
2. "Non-convention country". This is true if the transaction involves a specific set of countries that are not members of a specific international treaty. Uginta and Tigeria are included in this set.
3. "Border line amount". This is true if the transaction is close to the minimum so that it automatically falls under surveillance. In this example, an amount between $ 9500 and $ 9999 is considered a borderline.
4). “The number of previous border lines”. This is the total number of transactions that have been in the borderline within the past week.

これらの派生属性がマルチレベル推論プロセスによって評価されると、このケースは、以下のようになる。 When these derived attributes are evaluated by a multilevel inference process, this case becomes:

この例におけるこのマルチレベル推論は、全ての属性を定めるために、二つの推論解析を必要とする。
１．第１の解析は、属性「石油産業」、「非条約国」及び「ボーダライン金額」を決定できるようにする。これに続いて、
２．第２の解析は、属性「以前のボーダライン数」を計算できるようにする。 This multi-level inference in this example requires two inference analyzes to define all attributes.
1. The first analysis makes it possible to determine the attributes “oil industry”, “non-conventional country” and “border line amount”. Following this,
2. The second analysis makes it possible to calculate the attribute “number of previous border lines”.

マルチレベル推論の他の例は、（参照により本明細書に組み込まれる）オーストラリア特許出願第２０１０９０４５４５号において、本発明の出願人及び発明者らによって提供されている。 Another example of multi-level reasoning is provided by the applicant and inventors of the present invention in Australian Patent Application 2010904545 (incorporated herein by reference).

属性レベルでのマルチレベル推論の適用を使用する派生属性は、単一の式又は表現に関して指定され、そのため、決定できるのは、プライマリ属性のみから導出できる派生属性に限られる。 Derived attributes that use multi-level inference application at the attribute level are specified with respect to a single expression or expression, so that only those derived attributes that can be derived from the primary attribute can be determined.

ルールレベルでのマルチレベル推論
より上位の抽象化は、
１．（上で示されたような）プライマリ属性から抽出される派生属性、及び
２．ルールベースによって抽出される知識ベースの結論
によって可能になる。 Multilevel inference at the rule level
1. 1. derived attributes extracted from primary attributes (as shown above), and It is made possible by the conclusion of the knowledge base extracted by the rule base.

ルールベース推論の第１のレベルでは、結論は、属性（プライマリ若しくは派生属性又はその両方）の点から定義された一つ以上のルールによって与えられる。 At the first level of rule-based reasoning, the conclusion is given by one or more rules defined in terms of attributes (primary and / or derived attributes).

第２のレベルでは、第１レベルの結論をルール条件内に組み込んだルールを使用して結論を与えてもよい。 At the second level, the conclusion may be given using a rule that incorporates the first level conclusion in the rule condition.

このルールレベルでのマルチレベル推論は、その後、必要に応じて、第３及びそれ以降のレベルに適用してもよい。 This multi-level inference at the rule level may then be applied to the third and subsequent levels as needed.

上記の金融例では、ユーザは、ルール条件
・「ボーダライン金額」が真かどうか
・「非条約国」が真かどうか
・「以前のボーダライン数」が２以上であるかどうか
を使用して、第１レベルの結論を「疑わしい取引」と定めてもよい。このケースでは、これらの条件は全て真である。 In the financial example above, the user uses the rule condition: • “Border Line Amount” is true • “Non-Convention Country” is true • “Previous Border Line Number” is 2 or more The first level conclusion may be defined as “suspicious transaction”. In this case, all these conditions are true.

「疑わしい取引」が、単なる別の派生属性ではなく結論として定められる理由は、この「疑わしい取引」が、単一の式又は表現を用いて定められるものよりも複雑な概念であるからである。 The reason that "suspicious transaction" is defined as a conclusion rather than just another derived attribute is that this "suspicious transaction" is a more complex concept than that defined using a single formula or expression.

例えば、「疑わしい取引」を示す全く異なるシナリオが多数存在する可能性がある。各シナリオは、異なる一組のルール条件によって指定され、そのいずれもが、結論「疑わしい取引」を与える。 For example, there can be many completely different scenarios that indicate “suspicious transactions”. Each scenario is specified by a different set of rule conditions, all of which give the conclusion “suspicious transaction”.

逆に、「疑わしい取引」シナリオに非常に類似して見えるが、それを疑わしくないとする僅かな相違があるケースが存在する可能性もある。 Conversely, there may be cases where there appears to be very similar to a “suspicious transaction” scenario, but there is a slight difference that makes it not suspicious.

ユーザは、ここで、ルール条件
・「疑わしい取引」、及び
・「以前のフラグなし」
を使用して、第２レベルの結論「要検査」を定めてもよい。 The user now has a rule condition: • “suspicious transaction”, and • “no previous flag”
May be used to define a second level of conclusion “checking required”.

ここでも、これらの条件は、このケースについてはともに真である。結論「以前のフラグなし」は、すでにフラグが付けられた取引についてのフラグ付けを回避し、それによって取引の重複した検査を回避するために含まれており、その目的は、ここでも、人間の専門家の時間を浪費しないためである。 Again, these conditions are both true for this case. The conclusion "no previous flag" is included to avoid flagging transactions that have already been flagged, thereby avoiding duplicate inspection of transactions, the purpose of which is again human. This is to avoid wasting specialist time.

ユーザは、「説明」と呼ばれる第３のレベルの結論も定義することができる。この結論は、「＜金額＞のこの取引は疑わしい。会社＜会社＞からの過去ボーダライン取引が先週＜以前のボーダライン数＞回あり、現在の取引は非条約国＜国＞からのものであるため。」という値を有する。この結論を与えるルール条件は、単に「要検査」である。 The user can also define a third level of conclusion called “explanation”. The conclusion is, “This transaction for <amount> is suspicious. Last borderline transaction from company <company> has been <number of previous borderlines> times and current transaction is from a non-convention country <country>. Because there is a value. The rule condition that gives this conclusion is simply “checking”.

この結論の目的は、なぜこのケースが知識ベースによって検査のためにフラグを付けられたのかについて、テキストの説明を人間の検査者に提供することである。コメント内の変数（＜＞記号によって示される）は、このケースにおける特定の属性値から評価される。 The purpose of this conclusion is to provide a human inspector with a text explanation of why this case was flagged for examination by the knowledge base. Variables in comments (indicated by <> symbols) are evaluated from specific attribute values in this case.

ユーザは、ルール条件
「要検査」、及び
「石油会社」
を使用して、値「石油取引」を有する「待ち行列」と呼ばれる第３のレベルの結論も定めることができる。この結論の目的は、特定の種類の取引（このケースでは、石油会社による取引）に経験を有する専門家によって検査を受ける特定の待ち行列にこのケースを送ることである。 The user must use the rule conditions “Inspection Required” and “Oil Company”
Can also be used to define a third level of conclusion called the “queue” with the value “oil trade”. The purpose of this conclusion is to send this case to a specific queue that is examined by an expert who is experienced in a specific type of transaction (in this case, a transaction by an oil company).

これらの結論がマルチレベル推論プロセスによって評価されると、このケース及びそのルールベースの解釈は、以下のようになる。 Once these conclusions are evaluated by a multilevel inference process, the interpretation of this case and its rule base is as follows:

全ての属性が定められた後は、これらの全ての結論を定めるために、三つの推論パスが必要とされる。 Once all attributes have been defined, three inference passes are required to determine all these conclusions.

第１のパスでは、結論「疑わしい取引」が与えられる。第２のパスでは、「要検査」が与えられる。第３のパスでは、「待ち行列」及び「説明」が与えられる。 In the first pass, the conclusion “suspicious transaction” is given. In the second pass, “need inspection” is given. In the third pass, “Queue” and “Description” are given.

リップルダウンルール（ＲｉｐｐｌｅＤｏｗｎＲｕｌｅｓ）方法論を利用する知識ベースにルールレベルでのマルチレベル推論を適用することで、「疑わしい取引」のような概念を、疑わしい取引を示す全てのシナリオを考慮し、しかも疑わしい取引を示さない全てのシナリオを除外した上で、結論として定めることが可能になる。これは、マルチレベル推論だけを使用して一つ以上の派生属性によって提供されるものよりも広い視野を与える。 By applying multi-level inference at the rule level to a knowledge base that uses the Ripple Down Rules methodology, a concept such as “suspicious transaction” is considered, and all scenarios that show suspicious transaction are considered and suspicious It is possible to conclude after excluding all scenarios that do not show a deal. This gives a broader view than that provided by one or more derived attributes using only multi-level reasoning.

マルチレベル推論は、その専門家分野にとって適切な抽象化レベルにある概念を用いて知識ベースを定義することを可能にする。 Multi-level reasoning allows the knowledge base to be defined using concepts that are at the level of abstraction appropriate for the expert field.

属性レベルでは、マルチレベル推論は、オンライン情報システムから受け取った多かれ少なかれ恣意的な一組のプライマリ属性を、その分野に特有で適切な概念を表す派生属性へと前処理することを可能にする。 At the attribute level, multi-level reasoning allows a more or less arbitrary set of primary attributes received from an online information system to be pre-processed into derived attributes that represent concepts relevant to the field.

属性レベルにおける前処理は、例えば、自由テキストの大きなブロックから概念を抽出することによって、又は多数の関連する属性をより上位の属性にフィルタリング及びグループ化することによって、データ複雑性を低減することもできる。 Preprocessing at the attribute level can also reduce data complexity, for example, by extracting concepts from large blocks of free text or by filtering and grouping many related attributes into higher-level attributes. it can.

結論レベルでは、マルチレベル推論は、人間の専門家がこの分野を理解して特定のシナリオの解釈を正当化するときに通常使用するであろう概念を表すように上位の抽象概念を定めることを可能にする。 At the conclusion level, multi-level reasoning defines that the higher level abstraction represents a concept that a human expert would normally use when understanding this field and justifying the interpretation of a particular scenario. to enable.

結論を定めるためにリップルダウン（ＲｉｐｐｌｅＤｏｗｎ）方法論を使用することは、これら上位の抽象概念が時間とともに精緻化されて、きわめて複雑な可能性のある概念の自然表現ばかりでなく、非常に正確で明快な表現をも提供できることを意味する。 Using the RippleDown methodology to make a conclusion is that these higher level abstractions have been refined over time to provide not only a natural representation of potentially complex concepts, but also very accurate and clear. It means that you can also provide simple expressions.

リップルダウンの組み込みコーナストーンケース検証手順（別途説明する）と結び付く、ケースの分野特有の適切な抽象概念の集合を用いることで、これらの抽象概念によって大規模で洗練された有用な知識ベースを構築及び保守するという専門家の仕事が実現可能になる。 Build a large, sophisticated, and useful knowledge base with these abstractions using a set of appropriate abstractions specific to the domain of the case, coupled with ripple-down built-in cornerstone case verification procedures (discussed separately) And professional maintenance work becomes possible.

例１
第１の適用例は、数百のタンパク質発現マーカー又は遺伝子発現マーカーのマイクロアレイによってその値が決定される数百の検査を用いて診断が実行される白血病レポート知識ベースである。専門家は、関連するマーカーからなる複数のサブセット（部分集合）、このパターンに対応する診断、及び委託医師向けのテキストレポートに記載するそれら有意なサブセットについてのコメントを特定する診断レポート知識ベースを構築してもよい。 Example 1
The first application is a leukemia report knowledge base in which diagnosis is performed using hundreds of tests whose values are determined by a microarray of hundreds of protein expression markers or gene expression markers. Experts build a diagnostic report knowledge base that identifies comments on multiple subsets (subsets) of related markers, diagnoses corresponding to this pattern, and those significant subsets listed in text reports for referral physicians May be.

アレイ検査結果は、知識ベースへの入力として、複数のデータ項目及び値ペアとして提供される。この例では、これらのデータ項目には、識別のためにＣＤ１〜ＣＤ１００のラベルが付される。これらのラベルは、アレイに対する１００個の要素を表している。現実世界の例は、数百のマーカーを含むことがある。 The array test results are provided as a plurality of data items and value pairs as input to the knowledge base. In this example, these data items are labeled CD1 to CD100 for identification. These labels represent 100 elements for the array. Real-world examples may include hundreds of markers.

この例では、データ値のうちの一つの値が５０未満であることは、その患者のサンプルについては、そのマーカーに対応する抗体の発現がないことを意味する。５０よりも大きな値は、（他のマーカーの値によっては）有意である可能性がある。１００よりも高いマーカーの値は、有意な発現を示す。 In this example, if one of the data values is less than 50, it means that there is no expression of the antibody corresponding to the marker for the patient sample. A value greater than 50 may be significant (depending on the value of other markers). Marker values higher than 100 indicate significant expression.

特定の様々な白血病の診断は、１００個のデータ値の指定されたサブセットの値から推定することができる。 The diagnosis of a particular variety of leukemias can be estimated from the value of a designated subset of 100 data values.

例えば、Ｂ細胞慢性リンパ性白血病（Ｂ−ＣＬＬ）の診断は、ＣＤ１、ＣＤ２、ＣＤ３、ＣＤ４及びＣＤ５のうちの少なくとも二つが有意な発現を示すことから推定することができる。この診断は、ＣＤ６、ＣＤ７、ＣＤ８、ＣＤ９及びＣＤ１０のうちのいずれかが有意な発現を示すことによって支持されるが、これらは、それ自体でＢＣＬＬを診断するものではない。 For example, a diagnosis of B cell chronic lymphocytic leukemia (B-CLL) can be deduced from the fact that at least two of CD1, CD2, CD3, CD4 and CD5 show significant expression. This diagnosis is supported by the fact that any of CD6, CD7, CD8, CD9 and CD10 shows significant expression, but these do not diagnose BCLL by itself.

この他に、急性骨髄性白血病（ＡＭＬ）の診断は、ＣＤ１１、ＣＤ１２、ＣＤ１３、ＣＤ１４及びＣＤ１５のうちの少なくとも二つが有意な発現を示すことから推定することができる。この診断は、ＣＤ１６、ＣＤ１７、ＣＤ１８、ＣＤ１９及びＣＤ２０のうちのいずれかが有意な発現を示すことによって支持されるが、これらは、それ自体でＡＭＬを診断するものではない。 In addition, the diagnosis of acute myeloid leukemia (AML) can be estimated from the fact that at least two of CD11, CD12, CD13, CD14 and CD15 show significant expression. This diagnosis is supported by the fact that any of CD16, CD17, CD18, CD19 and CD20 shows significant expression, but these do not diagnose AML by itself.

５つの集約データ項目には、受け取ったデータ項目を用いて、下記の構造例によって指定されるようにデータが供給される。
１．「ＢＣＬＬ診断」は、データ項目ＣＤ１、ＣＤ２、ＣＤ３、ＣＤ４、ＣＤ５によってポピュレートされる。
２．「ＢＣＬＬ支持」は、データ項目ＣＤ６、ＣＤ７、ＣＤ８、ＣＤ９、ＣＤ１０によってポピュレートされる。
３．「ＡＭＬ診断」は、データ項目ＣＤ１１、ＣＤ１２、ＣＤ１３、ＣＤ１４、ＣＤ１５によってポピュレートされる。
４．「ＡＭＬ支持」は、データ項目ＣＤ１６、ＣＤ１７、ＣＤ１８、ＣＤ１９、ＣＤ２０によってポピュレートされる。
５．「白血病」は、集約データ項目「ＢＣＬＬ診断」、「ＡＭＬ診断」、「ＢＣＬＬ支持」、「ＡＭＬ支持」によってポピュレートされる。 Data is supplied to the five aggregated data items as specified by the following structure example using the received data items.
1. “BCLL diagnosis” is populated by data items CD1, CD2, CD3, CD4, CD5.
2. “BCLL support” is populated by data items CD6, CD7, CD8, CD9, CD10.
3. “AML diagnosis” is populated by data items CD11, CD12, CD13, CD14, CD15.
4). “AML support” is populated by data items CD16, CD17, CD18, CD19, CD20.
5. “Leukemia” is populated by the aggregate data items “BCLL diagnosis”, “AML diagnosis”, “BCLL support”, “AML support”.

これらのデータ項目と１５個の集約データ項目の階層関係を与える構造の一態様の概略が、図２１に示されている。幾つかの実施形態では、構造の下位の階層がポピュレートされると、上位の階層における値又は特性が計算される。構造は、デバイス１のメモリ２０やハードドライブ２（又は他のデータ記憶ユニット）等に保存されて、ＣＰＵ４によって解釈されてもよい。 FIG. 21 shows an outline of an aspect of a structure that gives a hierarchical relationship between these data items and 15 aggregated data items. In some embodiments, when the lower hierarchy of the structure is populated, the value or property in the upper hierarchy is calculated. The structure may be stored in the memory 20 of the device 1, the hard drive 2 (or other data storage unit) or the like and interpreted by the CPU 4.

以下の範囲
（ａ）定数５０と定義される「無検出」、及び
（ｂ）定数１００と定義される「高」
も定義してもよい。 The following ranges: (a) “No detection” defined as constant 50, and (b) “High” defined as constant 100.
May also be defined.

各セット内の有意なデータ項目を表すために、以下のルール
１．「有意なＢＣＬＬ診断」。これは、ルール「範囲［＞高］におけるＢＣＬＬ診断」によってポピュレートされる。
２．「有意なＢＣＬＬ支持」。これは、ルール「範囲［＞無検出］におけるＢＣＬＬ診断」によってポピュレートされる。
３．「有意なＡＭＬ診断」。これは、ルール「範囲［＞高］におけるＡＭＬ診断」によってポピュレートされる。
４．「有意なＡＭＬ支持」。これは、ルール「範囲［＞無検出］におけるＡＭＬ診断」によってポピュレートされる。
を適用することによって、さらなる集約データ項目がポピュレートされる。 In order to represent significant data items in each set, the following rules 1. “Significant BCLL diagnosis”. This is populated by the rule “BCLL diagnosis in range [> high]”.
2. “Significant BCLL support”. This is populated by the rule “BCLL diagnosis in range [> no detection]”.
3. “Significant AML diagnosis”. This is populated by the rule “AML diagnosis in range [> high]”.
4). “Significant AML support”. This is populated by the rule “AML diagnosis in range [> no detection]”.
By applying, further aggregated data items are populated.

ＢＣＬＬ診断コメントは、以下の疑似テキスト
「＜有意なＢＣＬＬ診断＞にてＰａｎ−Ｂ細胞抗原発現、＜有意なＢＣＬＬ支持＞にて共発現、Ｂ細胞慢性リンパ性白血病（Ｂ−ＣＬＬ）に典型的。」
によって与えられる。 BCLL diagnosis comment is the following pseudo-text "Pan-B cell antigen expression in <significant BCLL diagnosis>, co-expression in <significant BCLL support>, typical for B cell chronic lymphocytic leukemia (B-CLL) . "
Given by.

変数「＜有意なＢＣＬＬ診断＞」は、有意なＢＣＬＬのデータ項目の名称と値を列挙するための指示であり、変数＜有意なＢＣＬＬ支持＞についても同様である。この実施形態では、列挙される名称と値は、最も有意な属性が最初に列挙されるように、データ項目値の降順で順序付けられる。 The variable “<significant BCLL diagnosis>” is an instruction for enumerating the names and values of data items of significant BCLL, and the same applies to the variable <significant BCLL support>. In this embodiment, the listed names and values are ordered in descending order of the data item values so that the most significant attributes are listed first.

ＢＣＬＬ診断ルールは、ＢＣＬＬ診断コメントの生成を
「有意なＢＣＬＬ診断の数≧２」、且つ
「有意なＢＣＬＬ支持の数≧１」
のようにトリガする。 The BCLL diagnostic rule determines the generation of BCLL diagnostic comments by “number of significant BCLL diagnoses ≧ 2” and “number of significant BCLL support ≧ 1”.
Trigger like this.

すなわち、コメントは、集合「有意なＢＣＬＬ診断」内に二つ以上のデータ項目が存在し、且つ、集合「有意なＢＣＬＬ支持」内に一つ以上のデータ項目が存在する場合に生成される。 In other words, a comment is generated when two or more data items exist in the set “significant BCLL diagnosis” and one or more data items exist in the set “significant BCLL support”.

第２のコメント例として、ＡＭＬ診断コメントは、以下の疑似テキスト
「陽性の＜有意なＡＭＬ診断、名称として＞に基づいてＡＭＬ抗原発現と合致、＜有意なＡＭＬ支持、名称として＞にて共発現。Ｍ２分類の可能性が疑われる。」
によって与えられる。 As a second comment example, an AML diagnostic comment is co-expressed in the following pseudo-text "Positive <significant AML diagnosis, as name>", consistent with AML antigen expression, <significant AML support, as name> The possibility of M2 classification is suspected. "
Given by.

データ項目「＜有意なＡＭＬ診断、名称として＞」は、有意なＡＭＬのデータ項目の名称のみを列挙するための知識ベースに対する指示であり、変数＜有意なＡＭＬ支持、名称として＞についても同様である。 The data item “<significant AML diagnosis, as name>” is an instruction to the knowledge base for enumerating only the names of significant AML data items, and the same applies to the variable <significant AML support, as name>. is there.

この実施形態では、列挙される名称と値は、値がコメントに示されないとしても、最も有意なデータ項目が最初に列挙されるように、データ項目値の降順で順序付けられる。 In this embodiment, the listed names and values are ordered in descending order of data item values so that the most significant data items are listed first, even if the values are not shown in the comments.

ＡＭＬコメントの生成をトリガするＡＭＬ診断ルールは、
「有意なＡＭＬ診断の数≧２」、且つ
「有意なＡＭＬ支持の数≧１」
であってもよい。 The AML diagnostic rules that trigger the generation of AML comments are:
“Number of significant AML diagnoses ≧ 2” and “Number of significant AML support ≧ 1”
It may be.

コメントは、集合「有意なＡＭＬ診断」内に二つ以上のデータ項目が存在する場合に生成される。これは要するに、集合「ＡＭＬ診断」内に１００より大きい値を有するデータ項目が二つ以上存在し、集合「ＡＭＬ支持」内に５０より大きい値を有するデータ項目が少なくとも一つ存在することを意味する。以下のような患者「Ａ」からのサンプルの検査結果について検討する。 A comment is generated when there are two or more data items in the set “significant AML diagnosis”. This means that there are two or more data items with a value greater than 100 in the set “AML diagnosis” and at least one data item with a value greater than 50 in the set “AML support”. To do. Consider the test results of samples from patient “A” as follows:

これらの結果は、知識ベースの一実施形態に送られる。この知識ベースは、集約データ項目を
・有意なＢＣＬＬ診断は、「ＣＤ５、ＣＤ３及びＣＤ４」と評価
・有意なＢＣＬＬ支持は、「ＣＤ７及びＣＤ８」と評価
・有意なＡＭＬ診断及び有意なＡＭＬ支持は、双方ともヌル（空）と評価
のように評価し、発現を評価する。
These results are sent to one embodiment of the knowledge base. This knowledge base has aggregated data items: • Significant BCLL diagnoses are evaluated as “CD5, CD3 and CD4” • Significant BCLL support is evaluated as “CD7 and CD8” • Significant AML diagnosis and significant AML support is Both are evaluated as null (empty) and evaluated, and expression is evaluated.

その後、知識ベースは、上述のように定められたルールに従って解釈を行う。「有意なＢＣＬＬ診断」集合内には三つの要素が存在し、「有意なＢＣＬＬ支持」集合内には二つの要素が存在するので、このケースではＢＣＬＬルールが適用可能である。 Thereafter, the knowledge base interprets according to the rules determined as described above. Since there are three elements in the “significant BCLL diagnosis” set and two elements in the “significant BCLL support” set, the BCLL rule is applicable in this case.

知識ベースは、ＢＣＬＬコメント「＜有意なＢＣＬＬ診断＞」及び「＜有意なＢＣＬＬ支持＞」内の変数を評価し、次いで、評価済みのコメント
「ＣＤ５（２６０）、ＣＤ３（１９０）及びＣＤ４（１５０）にてＰａｎ−Ｂ細胞抗原発現、ＣＤ７（９０）及びＣＤ８（６０）にて共発現、Ｂ細胞慢性リンパ性白血病（Ｂ−ＣＬＬ）に典型的。」
を与える。 The knowledge base evaluates the variables in the BCLL comments “<significant BCLL diagnosis>” and “<significant BCLL support>” and then the evaluated comments “CD5 (260), CD3 (190) and CD4 (150 ) Pan-B cell antigen expression, CD7 (90) and CD8 (60) co-expression, typical for B cell chronic lymphocytic leukemia (B-CLL). "
give.

第２の例のために、以下のような患者「Ｂ」の検査結果について検討する。 For the second example, consider the following patient “B” test results.

これらの結果は、知識ベースに送られる。知識ベースは、最初、集約データ項目を
・有意なＢＣＬＬ診断及び有意なＢＣＬＬ支持は、双方ともヌル（空）と評価
・有意なＡＭＬ診断は、「ＣＤ１２及びＣＤ１１」と評価
・有意なＡＭＬ支持は、「ＣＤ１９、ＣＤ２０及びＣＤ１８」と評価
のように評価する。 These results are sent to the knowledge base. The knowledge base initially aggregated data items: • Significant BCLL diagnosis and significant BCLL support both evaluate as null (empty) • Significant AML diagnosis evaluates as “CD12 and CD11” • Significant AML support is , “CD19, CD20 and CD18”.

その後、知識ベースは、上記のルールに従って解釈を行う。「有意なＡＭＬ診断」集合内には二つの要素が存在し、「有意なＡＭＬ支持」集合内には三つの要素が存在するので、このケースではＡＭＬルールが適用可能である。 The knowledge base then interprets according to the above rules. Since there are two elements in the “significant AML diagnosis” set and three elements in the “significant AML support” set, the AML rule is applicable in this case.

その後、知識ベースは、コメント
「陽性のＣＤ１２及びＣＤ１１に基づいてＡＭＬ抗原発現と合致、ＣＤ１９、ＣＤ２０及びＣＤ１８にて共発現。Ｍ２分類の可能性が疑われる。」
を与える。 The knowledge base then commented: “Matches AML antigen expression based on positive CD12 and CD11, co-expressed on CD19, CD20 and CD18. The possibility of M2 classification is suspected.”
give.

例２
別の適用例は、実行可能なＩｇＥ検査が５００以上となる可能性があるアレルギーレポート知識ベースである。アレルギー専門家の仕事は、実行した検査のどのサブセット（部分集合）が患者について有意な結果値を有するか、どの検査値が有意でない可能性があるか、どの検査を追跡調査する必要があるか（可能性のある５００の検査のうちのどれを併せて追跡調査として実行すべきかを含む）について、委託医師に助言を与えることである。 Example 2
Another application is an allergy report knowledge base that can result in 500 or more possible IgE tests. The task of allergy specialists is which subset of tests performed has a significant outcome value for the patient, which test values may not be significant, and which tests need to be tracked Advising the referring physician about (including which of the 500 possible tests should be performed together as a follow-up).

一つの解決例は、以下の通りである。 One solution is as follows.

・可能性のあるデータ項目名の全ての集まりから、それらのデータ項目名を、分野に特有のルール（有意花粉属性など）及びケースに特有のルールに基づいて、複数の集約データ項目にグループ化する。 • Group all possible data item names into multiple aggregated data items based on discipline specific rules (such as significant pollen attributes) and case specific rules To do.

・集約データ項目の特性のいずれかを、更なるルール及び／又はコメントの基礎として使用する。例えば、ある集約データ項目の要素の数が一定の数を上回った場合、又はその集合が特定の要素を含む場合、特定のコメントを与える。 Use any of the characteristics of the aggregate data item as the basis for further rules and / or comments. For example, when the number of elements of a certain aggregate data item exceeds a certain number, or when the set includes a specific element, a specific comment is given.

・一つ以上の集約データ項目をコメント内の変数として使用する。例えば、「犬、猫及びピーナッツアレルギーが有意である」。このコメントにおいて、フレーズ「犬、猫及びピーナッツ」は、このケースにとって有意なアレルゲンから成る集約データ項目の評価である。このコメントの汎用的な形式は、「｛有意なアレルゲン｝アレルギーが有意である」とすることができ、ここで、集合｛有意なアレルゲン｝自体は、ルールによって定められる。 • Use one or more aggregated data items as variables in comments. For example, “dog, cat and peanut allergies are significant”. In this comment, the phrase “dog, cat and peanut” is an assessment of an aggregate data item consisting of significant allergens for this case. The general form of this comment can be “{significant allergen} allergy is significant”, where the set {significant allergen} itself is defined by the rules.

・任意で、データ項目の値をコメントに含める。例えば、「犬（１０２．３）、猫（５６．４）及びピーナッツ（４３．５）アレルギーが有意である」。 • Optionally, include the value of the data item in the comment. For example, “dog (102.3), cat (56.4) and peanut (43.5) allergies are significant”.

・コメント内に出現する集約データ項目内のデータ項目を適切に順序付ける。例えば、コメント内で最も重要な属性が最初に出現するように、その事例における属性値の降順で順序付ける。 • Properly order data items within aggregated data items that appear in comments. For example, order the attribute values in the case in descending order so that the most important attribute appears first in the comment.

・データ項目を、レポートの残りの部分と整合した自然に構成された文に自動的にフォーマットする。例えば、三つの属性が有意である場合には、その集合のフォーマットを「犬、猫及びピーナッツアレルギー」とし、二つの属性だけが有意である場合には、その集合のフォーマットを「犬及び猫アレルギー」としてもよい。 Automatically format data items into naturally structured sentences that are consistent with the rest of the report. For example, if three attributes are significant, the set format is “dog, cat and peanut allergy”, and if only two attributes are significant, the set format is “dog and cat allergy. It is good also as.

・データ項目の以前のグループ化に基づいて、データ項目のグループ化を定義することができるようにする。例えば、新しい集約データ項目は、一つの集合と別の集合の差、和、積、又は任意の集合演算とすることができる。これにより、複数の集合からなる階層構造を定義することが可能になる。例えば、集合「適切な検査」と集合「指示された検査」との差は、まだ指示されていない適切な検査の集合を特定することができる。 -Allows grouping of data items to be defined based on previous groupings of data items. For example, the new aggregate data item can be the difference, sum, product, or any set operation between one set and another set. This makes it possible to define a hierarchical structure consisting of a plurality of sets. For example, the difference between the set “appropriate examination” and the set “instructed examination” may identify a set of appropriate examinations that have not yet been indicated.

・個人データ項目か、それらの属性を含む集約データ項目のいずれかを適宜に使用するコメントを定義できるようにする。例えば、個人データ項目のリストがその特定の事例のコメントにとって長すぎる場合、「ピーナッツ、大豆、ミルク、卵及び桃アレルギー」の代わりに、用語「食品アレルギー」を使用する。 • Be able to define comments that appropriately use either personal data items or aggregated data items that contain their attributes. For example, if the list of personal data items is too long for comments in that particular case, the term “food allergy” is used instead of “peanut, soy, milk, egg and peach allergy”.

・同様に、サブセット（部分集合）名の代わりにスーパー集約データ項目名を使用するコメントを適宜に定義できるようにする。例えば、個々の集約データ項目のリストがその特定のケースのコメントにとって長すぎる場合、「花粉、動物、カビ、…アレルギー」の代わりに、用語「吸入アレルギー」を使用する。 Similarly, it is possible to appropriately define a comment that uses a super aggregate data item name instead of a subset (subset) name. For example, if the list of individual aggregated data items is too long for comments in that particular case, the term “inhalation allergy” is used instead of “pollen, animal, mold, allergy”.

疑似テキスト
以下の表は、上述した疑似テキストの特徴の幾つかを定義しており、ここで、ｓ、ｔ、ｘ、ｙ、ｚは、それがプライマリ項目であるか派生データ項目であるかに関わらず、データ項目を指す。 Pseudotext The following table defines some of the characteristics of the pseudotext described above, where s, t, x, y, z are whether they are primary items or derived data items. Regardless, it refers to a data item.

文字Ｘ、Ｙ、Ｚは、特に、集約データ項目（派生データ項目の一種）を指す。文字ａ、ｂは、数字又は名称付き定数を指す。文字ｐは、ブール式を指す。 The letters X, Y, Z particularly refer to aggregated data items (a type of derived data item). The letters a and b refer to numbers or named constants. The letter p refers to a Boolean expression.

例３
第３の適用例は、航空券運賃表を解釈し、
（ａ）出発都市及び行先都市
（ｂ）どれだけの違約金が適用できるか
など、チケットを再発行できる条件を決定するために使用される知識ベースである。 Example 3
The third application example interprets the ticket fare table,
(A) Departure city and destination city (b) Knowledge base used to determine conditions under which tickets can be reissued, such as how much penalty can be applied.

第２の知識ベースは、再発行チケットを解釈して、実際に支払われた料金並びに出発都市及び行先都市を決定するために使用される。 The second knowledge base is used to interpret the reissue ticket to determine the actual fee paid as well as the departure and destination cities.

第３の知識ベースは、他の二つの知識ベースの出力を解釈し、再発行チケットが運賃表の条件に従っているかどうかを判断する。 The third knowledge base interprets the output of the other two knowledge bases to determine whether the reissue ticket complies with fare table conditions.

とりあえず第１の知識ベースについて検討すると、運賃表内の自由形式テキストの断片は、以下のようなものとすることができる。 Considering the first knowledge base for the time being, the free-form text fragment in the fare table can be as follows:

この運賃表の断片を前処理するために、テキスト要約器属性（ＴｅｘｔＣｏｎｄｅｎｓｅｒＡｔｔｒｉｂｕｔｅ：ＴＣＡ）が使用される。ＴＣＡにおける幾つかのキーワードは、
（ａ）「ＣＡＮＣＥＬＬＡＴＩＯＮＦＥＥ」（キャンセル料）及び他の類似の語句を表す「ＣＸ」
（ｂ）「ＢＥＦＯＲＥＤＥＰＡＲＴＵＲＥ」（出発前）及び他の類似の語句を表す「ＢＴ」
（ｃ）「ＡＵＤ１１０」や「ＡＵＤ７５」のように可変の金額を有する金銭値を表す「＜ＡＵＤｎ＞」
である。 A Text Condenser Attribute (TCA) is used to pre-process this fare table fragment. Some keywords in TCA are:
(A) “CANCELATION FEE” (cancellation fee) and “CX” for other similar phrases
(B) “BEFORE DEPARTURE” (before departure) and “BT” for other similar phrases
(C) “<AUD n>” representing a monetary value having a variable amount of money such as “AUD110” or “AUD75”.
It is.

ＴＣＡ内のキーコンセプトは、派生属性「ＣａｎｃｅｌｌａｔｉｏｎＦｅｅＢｅｆｏｒｅＴｒａｖｅｌ」であり、これは、この一連のキーワード「ＣＸＢＴ＜ＡＵＤＮ＞」から導出される可変の金銭値Ｎを有するものと定義される。この例では、属性「ＣａｎｃｅｌｌａｔｉｏｎＦｅｅＢｅｆｏｒｅＴｒａｖｅｌ」は、数値１１０を有する。 The key concept in the TCA is a derived attribute “CancellationFeeBeforeTravel”, which is defined as having a variable monetary value N derived from this series of keywords “CX BT <AUD N>”. In this example, the attribute “CancellationFeeBeforeTravel” has a numerical value 110.

知識ベースは、これ及び他の運賃表条件を、例えば「ＣａｎｃｅｌｌａｔｉｏｎＦｅｅＢｅｆｏｒｅＴｒａｖｅｌ＝１１０」などの標準化形式で、結論として出力するためのルールを有する。これらの標準化形式は、第３の知識ベースに入力され、第３の知識ベースは、再発行チケットについての実際の航行及び支払われた料金を要約する第２の知識ベースの出力と運賃表条件を比較する。その後、第３の知識ベースは、運賃表条件と、再発行チケットの要約された詳細を解釈して、運賃表条件に従って再発行チケットが再発行されたかどうかを判断する。 The knowledge base has rules for outputting this and other fare table conditions as conclusions in a standardized format such as “CancellationFeeBeforeTravel = 110”. These standardized forms are entered into a third knowledge base, which outputs the second knowledge base output and fare table conditions that summarize the actual navigation and fees paid for the reissue ticket. Compare. The third knowledge base then interprets the fare schedule conditions and the summarized details of the reissue ticket to determine whether the reissue ticket has been reissued according to the fare schedule conditions.

例４
第４の適用例は、一連のログファイルエントリを監視し、ＩＴサポートスタッフに警報を送信すべきかどうか、またいつ送信すべきかを決定するために使用される知識ベースである。 Example 4
A fourth application is a knowledge base that is used to monitor a series of log file entries and determine whether and when to send alerts to IT support staff.

エキスパートシステムは、ログファイルの監視及び解釈を行うＩＴサポートスタッフを支援するときに、基本的な役割を演じることができる。エキスパートシステムは、ＩＴサポートスタッフが重要であると見なす、またＩＴサポートスタッフによる何らかの予防措置又は是正措置が許可されることを示す特定の警報、警告、又は傾向を、この非常に大きな自由形式テキストデータから抽出するのを補助することができる。 Expert systems can play a fundamental role in assisting IT support staff in monitoring and interpreting log files. The expert system considers this very large free-form text data to identify specific alarms, warnings, or trends that the IT support staff considers important and indicate that any preventive or corrective action by the IT support staff is permitted. Can help to extract from.

ログファイルを解釈するときのより具体的な困難は、「偽陽性」、すなわち、実際には重要でない警報又は他の警報ほど重要でない警報の問題である。 A more specific difficulty when interpreting log files is the problem of “false positives”, ie alarms that are not really important or less important than other alarms.

例えば、最初の「メモリ不足」警報は有意であるかもしれないが、続いて繰り返される同じ警報は、すでに警告された問題を指しているだけのこともあり（偽陽性）、さもなければ、実際に全く新しい問題であることもある（真陽性）。これら二つの警報の間に長い時間差が存在する場合、第２の警報は、真陽性である可能性がより高い。 For example, the first “out of memory” alert may be significant, but the same alert repeated subsequently may only point to a problem that has already been warned (false positive); It may be a completely new problem (true positive). If there is a long time difference between these two alarms, the second alarm is more likely to be true positive.

別の例は、高い率のページフォールトに起因する「ディスクスラッシング」警報の場合である。これは重要であるが、場合によっては、より重要な問題の症状にすぎず、その問題とは、メモリ不足状態であるかもしれない。この場合、「ディスクスラッシング」警報より前の「メモリ不足」警報の存在は、第２の警報が、先行する「メモリ不足」警報が存在しない場合ほどには重要でないことを意味する。 Another example is the case of a “disk thrashing” alarm due to a high rate of page faults. This is important, but in some cases it is only a symptom of a more important problem, which may be an out of memory condition. In this case, the presence of an “out of memory” alert prior to the “disk thrashing” alert means that the second alert is not as important as if there was no preceding “out of memory” alert.

別の例は、「クライアント切断失敗」警報の場合である。１日のほとんどの時間においては、これは、ＩＴサポートスタッフによる即刻の調査を許可する有意な警報である。しかし、警報がログに取られたのが、１日の特定の時間、例えば、毎日の予防保守（ｐｒｅｖｅｎｔａｔｉｖｅｍａｉｎｔｅｎａｎｃｅ：ＰＭ）のためにシステムがオフライン状態にあることが知られている午前２時である場合、その警報は有意ではないかもしれない。 Another example is the “client disconnect failure” alert. For most of the day, this is a significant alarm that allows immediate investigation by IT support staff. However, the alert was logged at a specific time of the day, for example, at 2 am, when the system is known to be offline due to daily preventive maintenance (PM). In some cases, the alert may not be significant.

警報の有意性と適切な応答を判断するルールは、警報の種類だけでなく、それらの順序、頻度、互いに対するタイミング、更には絶対的タイムスタンプさえも考慮しなければならない。 Rules that determine the significance and appropriate response of an alarm must consider not only the types of alarms but also their order, frequency, timing relative to each other, and even absolute time stamps.

警報の有意性が判断されると、取るべき適切な行動に関する決定を、エキスパートシステムによって行うことができる。したがって、エキスパートシステムがログファイルを解釈するためには、個々のログエントリ又はログエントリのシーケンスを、キーターム又はキーコンセプトのシーケンスに分類するように前処理を行う必要がある。したがって、複雑で本質的には自由形式のテキストログファイルは、正規化された形式に縮小され、その正規化形式から、より単純でより上位の原子的データ項目を抽出し、ルール条件内で使用することができる。 Once the significance of the alarm is determined, a decision regarding the appropriate action to take can be made by the expert system. Therefore, in order for the expert system to interpret the log file, it is necessary to perform preprocessing so as to classify individual log entries or a sequence of log entries into a sequence of key terms or key concepts. Thus, complex and essentially free-form text log files are reduced to a normalized format from which simpler and higher-level atomic data items are extracted and used in rule conditions can do.

ユーザ（クライアント）を切断できないことをログエントリが示している場合における潜在的な警報状況の例を示す。しかし、システムが予防保守を開始した後にこの警告が発生した場合は、潜在的な警報は偽陽性と見なされ、ＩＴサポートスタッフに警報は送られない。 An example of a potential alarm situation when the log entry indicates that the user (client) cannot be disconnected is shown. However, if this alert occurs after the system has started preventive maintenance, the potential alert is considered a false positive and no alert is sent to the IT support staff.

一連のログファイルエントリの以下の例について検討する。 Consider the following example of a series of log file entries.

知識ベースによる解釈の前にこれらのログファイルエントリを前処理するため、我々はＴＣＡを構築する。ＴＣＡ中のキーワードは、
（ａ）「ＰｒｅｖｅｎｔａｔｉｖｅＭａｉｎｔｅｎａｎｃｅ」（予防保守）及び他の類似の語句を表す「ＰＭ」
（ｂ）「ＷＡＲＮＩＮＧ」（警告）及び他の類似の語句を表す「ＷＡＲＮ」
（ｃ）「Ｃｏｕｌｄｎｏｔｄｉｓｃｏｎｎｅｃｔｃｌｉｅｎｔ」（クライアントを切断できなかった）及び他の類似の語句を表す「ＤＣ」
である。 In order to pre-process these log file entries prior to interpretation by the knowledge base, we build a TCA. The keywords in TCA are
(A) “PM” for “Preventive Maintenance” (preventive maintenance) and other similar phrases
(B) “WARNING” (Warning) and “WARN” for other similar phrases
(C) “Could not connect client” (could not disconnect the client) and “DC” representing other similar phrases
It is.

これらのログファイルエントリについてのＴＣＡの値は、正規化テキスト形式「ＰＭＷＡＲＮＤＣ」である。このＴＣＡのキーコンセプトの一つは、派生属性「Ａｌｅｒｔ」（警報）である。これは、正規化テキスト形式が「ＷＡＲＮ」を含む場合（本例でも含む）にブール値「真」を有するように定義される。 The TCA value for these log file entries is in normalized text format “PM WARN DC”. One of the key concepts of this TCA is a derived attribute “Alert” (alarm). This is defined to have the Boolean value “true” when the normalized text format includes “WARN” (also in this example).

知識ベースは、属性Ａｌｅｒｔの値が「真」である場合にワークフロー動作「Ｓｅｎｄａｌｅｒｔｅｍａｉｌ」（警報電子メールを送る）を追加するためのルールを有する。このワークフロー動作は、通知先のＩＴサポートスタッフの電子メールアドレスを含み、電子メールヘッダは、警報の要約を提供し、電子メール本文は、警報の詳細を記述する。 The knowledge base has a rule for adding a workflow action “Send alert email” (send an alert email) when the value of the attribute Alert is “true”. This workflow operation includes the email address of the IT support staff to be notified, the email header provides a summary of the alert, and the email body describes the details of the alert.

ＴＣＡの別のキーコンセプトは、派生属性「ＦａｌｓｅＰｏｓｉｔｉｖｅ」（擬陽性）である。これは、正規化テキスト形式が「ＰＭＷＡＲＮＤＣ」を含む（本例でも含む）場合にブール値「真」を有するように定義される。 Another key concept of TCA is the derived attribute “FalsePositive” (false positive). This is defined to have the Boolean value “true” when the normalized text format includes “PM WARN DC” (also in this example).

知識ベースは、属性ＦａｌｓｅＰｏｓｉｔｉｖｅの値が「真」である場合に、ワークフロー動作「Ｓｅｎｄａｌｅｒｔｅｍａｉｌ」を削除するための別の後続するルールを有する。 The knowledge base has another subsequent rule to delete the workflow action “Send alert email” if the value of the attribute FalsePositive is “true”.

前処理段階では、これら二つの派生属性Ａｌｅｒｔ及びＦａｌｓｅＰｏｓｉｔｉｖｅがケースに追加され、双方とも値「真」を有する。 In the pre-processing stage, these two derived attributes Alert and FalsePositive are added to the case, both having the value “true”.

知識ベース推論の段階中、条件「Ａｌｅｒｔｉｓｔｒｕｅ」（警報が真）を有するルールによって、警報ワークフロー動作が解釈に追加される。しかし、警報ワークフロー動作は、条件「ＦａｌｓｅＰｏｓｉｔｉｖｅｉｓｔｒｕｅ」（擬陽性が真）を有する後続のルールによって削除され、最終的な結果としては、警報ワークフロー動作は解釈中に存在せず、したがって、ＩＴサポートスタッフに警報電子メールは送信されない。 During the knowledge base reasoning phase, an alert workflow action is added to the interpretation by a rule with the condition “Alert is true” (alert is true). However, the alert workflow action is deleted by a subsequent rule that has the condition “FalsePositive is true” (false positive is true), and the net result is that the alert workflow action does not exist in the interpretation and therefore IT support staff No alarm email will be sent to you.

ここまで実施形態を説明してきたので、幾つかの実施形態が、以下の利点の幾つかを有しうることが分かるだろう。 Having described the embodiments so far, it will be appreciated that some embodiments may have some of the following advantages.

・単一のテキストレポートを生成する目的で、多数のデータ項目（幾つかの実施形態では、自由形式テキストとして提示されるテキスト結果を含む）を処理することが可能である。 Multiple data items (including text results presented as free-form text in some embodiments) can be processed for the purpose of generating a single text report.

・そのレポートは、各ケースにとって重要なデータ項目を、適切な順序で、言語学的に自然な構文法で提示する。 The report presents the data items that are important to each case, in the appropriate order, in a linguistically natural syntax.

・具体的なレポートバリエーションの数は、データ項目の可能なサブセットの数及び各サブセット内で可能な順序付けの数に起因して、本質的に無限である。 The number of specific report variations is essentially infinite due to the number of possible subsets of data items and the number of possible orderings within each subset.

・特定のレポートを決定する具体的なルール条件の数も、ケース中の属性及びそれらの値のパターンの数に起因して、本質的に無限である。 The number of specific rule conditions that determine a particular report is also essentially infinite, due to the number of attributes and their value patterns in the case.

・それにも関わらず、ルールは前処理段階の結果である派生属性に基づいているので、専門家は、管理可能な数のルールを用いて、知識ベースを構築及び保守することができる。 Nevertheless, since the rules are based on derived attributes that are the result of the preprocessing stage, the expert can build and maintain the knowledge base using a manageable number of rules.

・多数の属性及びそれに対応する多数のレポートバリエーションを管理できるエキスパートシステムが提供される。 An expert system is provided that can manage multiple attributes and corresponding multiple report variations.

具体的な実施形態に示される本発明には、広範に説明されたような本発明の主旨及び範囲から逸脱することなく、多くの変形及び／又は変更をなし得ることが理解されるだろう。したがって、本発明の実施形態は、全ての点において、限定的ではなく、例示的なものと見なされるべきである。 It will be understood that the invention shown in the specific embodiments can be subject to many variations and / or modifications without departing from the spirit and scope of the invention as broadly described. Accordingly, the embodiments of the invention are to be considered in all respects only as illustrative and not restrictive.

１システム
２コンピュータ可読媒体
３システム
４中央プロセッサ
８データ項目
１０データ受信機
１１データ送信機
１２リモートサイト
１４ネットワーク
２０メモリ
２２データソース
２４集約データ項目
２６テキスト情報
２８画面
３０プリンタ
３２ワークステーション
３６受信機
３７情報システム
３８ユーザ DESCRIPTION OF SYMBOLS 1 System 2 Computer readable medium 3 System 4 Central processor 8 Data item 10 Data receiver 11 Data transmitter 12 Remote site 14 Network 20 Memory 22 Data source 24 Aggregated data item 26 Text information 28 Screen 30 Printer 32 Workstation 36 Receiver 37 Information system 38 users

Claims

A method for generating information from a plurality of personal data items , performed by a knowledge-based system for inferring conclusions , comprising:
(A) a population step of populating aggregated data items using at least one of the plurality of personal data items ,
Each personal data item includes original information including an attribute and a value that are identifiers of the personal data item,
The aggregated data item is a form of a derived attribute;
The derived attribute is a conclusion estimated by a rule-based knowledge base and represents a transformation from a set of personal data items to a single data item having a value;
The value of the derived attribute is attribute-to-value for each personal data item in the set of personal data items such that the derived attribute forms a single data item suitable for inference by a rule-based knowledge base. An aggregate value including mapping, wherein the single data item retains the original information for each of a plurality of the personal data items and queries from the entire knowledge base to extract information about the personal data items Receive
The population step ;
(B) an application step of applying a rule applied by a rule-based knowledge base to infer to the aggregated data item,
The rule so that a single rule can query the plurality of personal data items as a single data item rather than relying on multiple rules for each personal data item or a combination of the plurality of rules. Are performed on the set of personal data items, (i) query, (ii) iteration, (iii) identify subset, (iv) identify specific personal data items, (v) sort, ( vi) comparing a set of personal data items with another set of personal data items, and (vii) a set operation that includes one or more of the other set operations.
The applying step;
(C) using the aggregate data items and a generation step of generating information,
The method of generating information is performed by a rule-based knowledge base that generates information by applying one or more of the rules to at least one of the aggregated data items ;
The information generated is (i) belongs to one or more of text information and (ii) machine instructions,
The generating step is such that the rule-based knowledge base can generate information on a plurality of personal data items by applying a rule including a set operation to a derived attribute.
i. Including in the information an identifier of one or more personal data items that populate the aggregated data items;
ii. Sub-step of including in the information the value associated with one or more personal data items that populate the aggregated data item;
Including one or more of
Method.

The method of claim 1, wherein the generating step further comprises a sub-step of determining an order of identifiers for one or more of (a) personal data items and (b) aggregated data items .

The method according to claim 1 or 2 , wherein the text information is grammatically correct.

The method according to claim 1, wherein the one or more rules are applied by the rule-based knowledge base when generating information from at least a part of a ripple down rule knowledge system.

The plurality of personal data items includes at least unstructured data items having a value consisting of one or more of (a) free-form text and (b) a sequence of free-form values . 5. The method according to any one of 4.

The derived attribute is a text summarizer attribute;
The text summarizer attribute is free so that the unstructured data items are processed from complex data items to one or more simple data items each having a value for ease of interpretation. One or more regular expressions in the formatted text,
(A) a standard representation of the free form text that is a sequence of keywords in a normalized form;
(B) a number of atomic data items, each atomic data item representing a key concept enumerated in the free-form text data item, and the value of each atomic data item is (i) a Boolean A number of atomic data items that are one of values, (ii) a finite enumeration, and (iii) a numeric value;
Mapping to one or more of
The method of claim 5.

To make interpretation easier,
(A) populating aggregated data items with at least one of a plurality of derived attributes;
(B) applying one or more rules including one or more set operations to the aggregated data item;
(C) mapping regular expressions in free-form text to one or more of the standard expressions or the multiple atomic data items;
One or more of the steps occur during the iteration,
The method of claim 6 .

A method for generating information from a plurality of personal data items, performed by a knowledge-based system for inferring conclusions, comprising:
(A) an application step of applying a rule to an aggregate data item to infer,
The personal data item comprises original information including an attribute and a value that are identifiers of the personal data item;
The aggregated data item is a form of a derived attribute;
The derived attribute is a conclusion estimated by a rule-based knowledge base and represents a transformation from a set of personal data items to a single data item having a value;
The derived attribute value is an attribute-to-value mapping for each personal data item in the set of personal data items such that the derived attribute forms a single data item suitable for inference by a rule-based knowledge base. The single data item holds the original information about each of the plurality of personal data items, and queries from the entire knowledge base to extract information about the personal data items. receive,
The applying step;
(B) an evaluation step of evaluating the result of one or more rules using one or more aggregated data items, each including one or more personal data items,
The one or more rules are applied by a rule-based knowledge base;
The single rule can be queried as a single data item rather than relying on multiple rules or combinations of rules for each personal data item. The above rules are performed on a set of personal data items: (i) query, (ii) iteration, (iii) specify subset, (iv) specify specific personal data items, (v) sort , (Vi) comparing a set of personal data items with another set of personal data items, and (vii) a set operation including one or more of the other set operations.
The evaluation step;
(C) a generating step for generating the information according to the result,
The generating step is performed by a rule-based knowledge base that generates information by applying one or more of the rules to at least one of the aggregated data items;
The generated information belongs to one or more of (i) text information and (ii) machine instructions;
The generating step is such that the rule-based knowledge base can generate information on a plurality of personal data items by applying a rule including a set operation to a derived attribute.
i. Including in the information an identifier of one or more personal data items that populate the aggregated data items;
ii. Sub-step of including in the information the value associated with one or more personal data items that populate the aggregated data item;
Including one or more of
The generating step;
Including methods.

A method for generating information from a plurality of personal data items, performed by a knowledge-based system for inferring conclusions, comprising:
(A) a receiving step of receiving a conceptual representation of information including an interpretation portion representing an operation on an aggregate data item including a plurality of personal data items,
Each personal data item includes original information including an attribute and a value that are identifiers of the personal data item,
The aggregated data item is a form of a derived attribute;
The derived attribute is a conclusion estimated by a rule-based knowledge base and represents a transformation from a set of personal data items to a single data item having a value;
The value of the derived attribute is attribute-to-value for each personal data item in the set of personal data items such that the derived attribute forms a single data item suitable for inference by a rule-based knowledge base. An aggregate value including mapping, wherein the single data item retains the original information for each of a plurality of the personal data items and queries from the entire knowledge base to extract information about the personal data items Receive
The receiving step;
(B) an application step of applying a rule applied by a rule-based knowledge base to infer to the aggregated data item,
The rule so that a single rule can query the plurality of personal data items as a single data item rather than relying on multiple rules for each personal data item or a combination of the plurality of rules. Are performed on the set of personal data items, (i) query, (ii) iteration, (iii) identify subset, (iv) identify specific personal data items, (v) sort, ( vi) comparing a set of personal data items with another set of personal data items, and (vii) a set operation that includes one or more of the other set operations.
The applying step;
(C) a generation step of generating the information from the interpretation part,
The information is generated by applying one or more rules to at least one of the aggregated data items;
The generated information belongs to one or more of (i) text information and (ii) machine instructions;
The generating step is such that the rule-based knowledge base can generate information on a plurality of personal data items by applying a rule including a set operation to a derived attribute.
i. Including in the information an identifier of one or more personal data items that populate the aggregated data items;
ii. Sub-step of including in the information the value associated with one or more personal data items that populate the aggregated data item;
Including one or more of
The generating step;
Including methods.

A system for generating information from a plurality of personal data items,
(A) a computing device;
(B) a preprocessor that is executable in the computing device and populates aggregated data items using at least one of the plurality of personal data items;
Each personal data item includes original information including an attribute and a value that are identifiers of the personal data item,
The aggregated data item is a form of a derived attribute;
The derived attribute is a conclusion estimated by a rule-based knowledge base and represents a transformation from a set of personal data items to a single data item having a value;
The value of the derived attribute is attribute-to-value for each personal data item in the set of personal data items such that the derived attribute forms a single data item suitable for inference by a rule-based knowledge base. An aggregate value including mapping, wherein the single data item retains the original information for each of a plurality of the personal data items and queries from the entire knowledge base to extract information about the personal data items Receive
The preprocessor;
(C) a rule-based knowledge base that infers by applying rules to the aggregated data items,
The rule so that a single rule can query the plurality of personal data items as a single data item rather than relying on multiple rules for each personal data item or a combination of the plurality of rules. Are performed on the set of personal data items, (i) query, (ii) iteration, (iii) identify subset, (iv) identify specific personal data items, (v) sort, ( vi) comparing a set of personal data items with another set of personal data items, and (vii) a set operation that includes one or more of the other set operations.
The rule-based knowledge base;
(D) an information generator that generates information using the derived attribute;
With
The information generator forms at least part of a decision support system;
The generated information belongs to one or more of (i) text information and (ii) machine instructions;
The generation of the information is such that the rule-based knowledge base can generate information on a plurality of personal data items by applying a rule including a set operation to a derived attribute.
i. Including in the information an identifier of one or more personal data items that populate the aggregated data items;
ii. Sub-step of including in the information the value associated with one or more personal data items that populate the aggregated data item;
Including one or more of
A system comprising:

The system of claim 10, wherein the preprocessor can iteratively construct derived attributes that use conclusions previously generated in an iterative process.

12. A system according to claim 10 or 11, wherein the information generator can iteratively construct conclusions using rules that use previously defined conclusions in an iterative process.

The derived attribute is
(A) Aggregated data item and
(B) a text summarizer attribute;
(C) extracting any one or more superordinate concept data items from a plurality of data items, thereby reducing any data complexity and any other result of data preprocessing
The system according to claim 10, wherein the system is one or more of the following.

A receiver for receiving the plurality of personal data items;
The receiver is one of (a) a handheld electronic device and (b) another computing device having processing power;
The system according to any one of claims 10 to 13.

(A) Aggregated data item,
(B) one or more rules,
(C) other derived attributes;
(D) one or more conclusions,
(E) any other result of data preprocessing that extracts one or more superordinate concept data items from a plurality of data items, thereby reducing data complexity;
(F) any other conceptual representation used in defining the interpretation part in the generation of the information
The system according to any one of claims 10 to 14, further comprising a builder capable of processing one or more of the above.

15. A system according to any one of claims 10 to 14, further comprising a transmitter for sending the generated information to one or more of (a) a machine and (b) a recipient.

A system for generating information from a plurality of personal data items,
(A) a rule-based knowledge base that infers by applying one or more rules to aggregated data items,
The personal data item comprises original information including an attribute and a value that are identifiers of the personal data item;
The aggregated data item is a form of a derived attribute;
The derived attribute is a conclusion estimated by a rule-based knowledge base and represents a transformation from a set of personal data items to a single data item having a value;
The value of the derived attribute is attribute-to-value for each personal data item in the set of personal data items such that the derived attribute forms a single data item suitable for inference by a rule-based knowledge base. An aggregate value including mapping, wherein the single data item retains the original information for each of a plurality of the personal data items, and queries from the entire knowledge base to extract information about the personal data items Receive
The rule-based knowledge base;
(B) an evaluator for evaluating the result of the one or more rules,
The single rule can be queried as a single data item rather than relying on multiple rules or combinations of rules for each personal data item. The above rules are executed on the set of personal data items: (i) query, (ii) iteration, (iii) identification of subset, (iv) identification of specific personal data items, (v) Sorting, (vi) comparing a set of personal data items with other sets of personal data items, and (vii) a set operation including one or more of the other set operations.
The evaluator;
(C) an information generator for generating information according to the result,
The information is generated by applying the one or more rules to at least one aggregated data item;
The generated information belongs to one or more of (i) text information and (ii) machine instructions;
The generation of the information is such that the rule-based knowledge base can generate information on a plurality of personal data items by applying a rule including a set operation to a derived attribute.
i. Including in the information an identifier of one or more personal data items that populate the aggregated data items;
ii. Sub-step of including in the information the value associated with one or more personal data items that populate the aggregated data item;
Including one or more of
The information generator and
A system comprising:

A system for generating information,
  (A) a receiver for receiving a conceptual representation of information including an interpretation portion representing an operation on an aggregate data item including a plurality of personal data items,
Each personal data item includes original information including an attribute and a value that are identifiers of the personal data item,
The aggregated data item is a form of a derived attribute;
The derived attribute is a conclusion estimated by a rule-based knowledge base and represents a transformation from a set of personal data items to a single data item having a value;
The value of the derived attribute is attribute-to-value for each personal data item in the set of personal data items such that the derived attribute forms a single data item suitable for inference by a rule-based knowledge base. An aggregate value including a mapping, wherein the single data item holds the original information for each of a plurality of the personal data items, and from the knowledge base as a whole to extract information about the personal data items Get an inquiry,
The receiver;
(B) a rule-based knowledge base that infers by applying rules to the aggregated data items,
The rule so that a single rule can query the plurality of personal data items as a single data item rather than relying on multiple rules for each personal data item or a combination of the plurality of rules. Are performed on the set of personal data items, (i) query, (ii) iteration, (iii) identify subset, (iv) identify specific personal data items, (v) sort, ( vi) comparing a set of personal data items with another set of personal data items, and (vii) a set operation that includes one or more of the other set operations.
The rule-based knowledge base;
(C) an information generator for generating the information from the interpretation part,
The information is generated by applying one or more rules to at least one of the aggregated data items;
The generated information belongs to one or more of (i) text information and (ii) machine instructions;
The generation of the information is such that the rule-based knowledge base can generate information on a plurality of personal data items by applying a rule including a set operation to a derived attribute.
    i. Including in the information an identifier of one or more personal data items that populate the aggregated data items;
    ii. Sub-step of including in the information the value associated with one or more personal data items that populate the aggregated data item;
Including one or more of
The information generator and
A system comprising:

A computer program comprising instructions for controlling a computer to carry out a method according to the method of claim 1.

A computer readable medium providing a computer program according to the computer program of claim 19.