MXPA97000867A

MXPA97000867A - Method and system to identify patients with diagnosis of depression, which are in rie

Info

Publication number: MXPA97000867A
Application number: MXPA/A/1997/000867A
Authority: MX
Inventors: Friedman Felix; Wong Bruce
Original assignee: Smithkline Beecham Corporation
Priority date: 1996-02-02
Filing date: 1997-02-03
Publication date: 1998-01-01

Abstract

A computer-implemented technique, including database processing, is used to identify at-risk stocks in a declarations database. The technique includes processing patient information in the declarations database to find and extract information from statements for a group of patients with depression. Then, using the information extracted, a set of events relevant to the depression is defined. Then, the extracted information and the set of events are processed to create event-level information, which is organized with respect to events rather than statements. A time window is defined to provide a time frame from which to judge whether events should be considered in subsequent processing, and, a set of variables is defined as being potential predictors of adverse health consequences. Subsequently, the event level information is processed, using the time window and the set of variables, to generate an analysis file. Statistical analyzes, such as logistic regression, are performed on the analysis file to generate a prediction model, where the prediction model is a function of a subset of the set of variables. Finally, the prediction model is an exit of patients at risk, diagnosed with depression, which is likely to have adverse health consequences.

Description

METHOD AND SYSTEM TO IDENTIFY PATIENTS WITH DIAGNOSIS OF DEPRESSION, WHICH ARE AT RISK BACKGROUND OF THE INVENTION This invention relates to database processing techniques and, more particularly, this relates to the identification of patients with depression who are at risk of adverse health consequences, using different database processing techniques. Depression is one of the most common treatable conditions that affect our society. In fact, depression occurs at levels comparable to angina and coronary artery disease. The frequent occurrence of point for major depression in Western industrialized nations is from 2.3 to 3.2 percent for men and from 4.5 to 9.3 percent for women. The lifetime risk for depression is 7 to 12 percent for men and 20 to 25 percent for women. These statistics reflect the substantial burden that depression has on society. The economic burden of depression, however, is more difficult to quantify. Some estimates show that depression accounts for about 1/3 of the direct costs of all mental illnesses ($ 67 billion in 1990). Of the costs related to depression, approximately 2/3 are related to direct medical expenses. Although estimates of the economic cost of depression vary, they were conservatively estimated at $ 16.3 billion in 1980, of which approximately 2/3 were direct medical costs. Depression is widely perceived as an essentially self-limiting condition, where a history of good functioning is interrupted by brief periods of illness and subsequent recovery. More than 50 percent of patients have recurrent episodes of depression. Then the treatment can be seen as of an episodic nature with the handling of individual episodes. The current Practice Guideline For Major Depressive Disorder in Adults published by the American Psychiatric Association (APA) in 1993, describes different means of diagnosing and treating depression, and it is incorporated herein by reference, for its teachings about the diagnosis and treatment of depression. There is also another literature, for example, literature published by the Agency of Health Care Policy Research (AHCPR), which describes the disease, its symptoms and the means of diagnosis and treatment. To date, the treatment of depression has been on an individual basis. However, there are numerous reasons for cessation of individual treatment regimens, including all those factors that ordinarily enter into a "cost-benefit" analysis at an individual level (the likelihood of further improvement, the severity of the disease, side effects). of the medication, etc.). In this way, it seems that, in view of the global burden that depression creates for society, particularly the financial burden - alternative means of dealing with depression need to be explored. For example, there is evidence in support of the efficacy of maintaining a chronic therapy. Under this theory, the clinical goal would be the maintenance of euthymia, non-repetitive treatment of recurrent episodes, which may contribute to a deteriorating course of life. Under this theory, however, the assumptions seem to indicate that it could only be feasible to treat a portion of the population diagnosed with depression in this way, perhaps, with targeted interventions in subgroups at risk of adverse consequences (in particular, recurrence). It is, therefore, a necessity to be able to accurately and effectively identify subgroups of the population with depression at high risk of adverse health consequences. COMPENDIUM OF THE INVENTION The present invention involves a method implemented by computer to generate a model, to identify at-risk patients diagnosed with depression, information about existing patients in a declarations database, said method comprising the steps of 1) processing, based on previously determined criteria, the patient's information in the database of statements, to find and extract information from the statements for a group of patients with depression; 2) define, using the information available in the declarations database, events relevant to depression; 3) process the information extracted from the declarations and the defined events, to create files containing information of the event level; 4) define a time window to provide a time frame, from which to judge whether events in subsequent processing should be considered; 5) define a set of variables as potential predictors; 6) process the event level information, using the time window and the set of variables, to generate an analysis file; and 7) perform statistical analysis on the analysis file, to generate a prediction model, said prediction model being a function of a subset of the set of variables. Another aspect of the present invention involves a computer-implemented method for identifying, using the generated model, at-risk patients diagnosed with depression, said method comprising the additional step of applying the prediction model to a processed statement database to identify and output to a file, listing the probability of each patient that has an adverse health consequence. BRIEF DESCRIPTION OF THE DRAWINGS The invention is best understood from the following detailed description, when read in connection with the accompanying drawings, in which: Figure IA is a high-level flow diagram illustrating an exemplary overall process of the present invention . Figure IB is a high-level flow diagram illustrating an exemplary process of the application of the present invention. Figure 2 is a high-level block diagram illustrating three exemplary information sources suitable for use with the present invention. Figure 3 is a data structure diagram showing an exemplary format therein that information from the sources of Figure 2 is stored in a research database. Figure 4 is a data structure diagram showing an exemplary format for an event level file, generated during the process shown in Figure 1.

Figure 5 is a data structure diagram showing an exemplary format for an analysis file generated, in part, from the event level file shown in Figure 4, and during the process shown in the Figure 1. Figure 6A is a timeline diagram showing a first exemplary time window scheme, suitable for use in the data processing of the event level files shown in Figure 4. Figure 6B is a timeline diagram showing a second exemplary time window scheme, suitable for use in the data processing of the event level files shown in Figure 4. Figure 7A is a table showing the experimental results using a hospitalization indicator (HL) with Scheme 1 shown in Figure 6A. Figure 7B is a table showing the experimental results using a High Cost indicator with Scheme 1 shown in Figure 6A. DETAILED DESCRIPTION OF THE INVENTION General Description The present invention is designed to identify, in a population of patients with previously determined depression, those patients at high risk of adverse health consequences. The identification of this subgroup of high risk being in an initial stage in attempts, for example, objective interventions, to avoid and / or improve its health consequence. Initially one or more sources of information are required, which allows the identification of an initial population of patients with depression. Examples of sources include health care providers, such as doctors, hospitals and pharmacies, which maintain records for their patients. Individual records for each of these providers, however, may be dispersed, difficult to access, and / or have many different formats. On the other hand, a more comprehensive source that contains this type of information exists in the records of health care statements of any given benefit provider. Turning to the figures, Figure IA is a high-level flow diagram illustrating an exemplary overall process of the present invention. As illustrated in Figure 1, information from "raw" declarations is received and stored in a database (eg, DB2 format) represented by block 110. In the world of declarative processing, before it is database of information "raw" may be useful, usually some preliminary processing is done, step 112, which could include rejection statements, reconciliation of multiple statements, and so on. The output of this preprocessing step, represented by block 114, is a "cleanup" database now stored, in exemplary mode, in SAS format. The SAS is a well-known software package and format, produced by the SAS Institute, Inc. of Cary, North Carolina. It should be noted that other data processing and storage formats can be used in the storage and processing of data, as will be appreciated by those skilled in the art. It should also be noted that the SAS formats, programming techniques and functions are described more fully in the SAS / STAT User's Guide, Version 6, Fourth Edition, Volumes 1 and 2, 1990 and the SAS Language: Reference, Version 6 , First Edition, 1990, which are incorporated herein by reference for their teachings regarding SAS language, programming, functions and SAS formats. On the other hand, the SAS routines that are used for information processing as part of the present invention are used for computational operations, executed on a computer and stored on a storage medium such as a magnetic tape, a disk, a CD ROM or other suitable medium for storage and / or portability purposes. Then the stored software can be used to run a computer. The records of benefits provider statementsAlthough they contain important information, they may not be organized in a way to make an efficient analysis. In this way, the next step is to perform another processing step (for example, selecting patients with depression, age, etc.), represented by block 116, to transform the "raw" data into a more appropriate and useful data base. . That is, the output data from the processing step (ie, the extraction) is a subset of the "raw" information and represents an initial universe of patients with depression over which 'additional processing is performed. A next step, which is optional, is to perform a "quality check" on the initial universe of patients with depression. This step is somewhat subjective. This processing step, represented by block 118, using intermediate output files, performs a refinement of the extracted information by, for example, checking to see if there is an imbalance in the extracted information, such as that all statements are from individuals over 60 years of age or that all statements are from men. This step, essentially a common sense check, can be done as many times as necessary to ensure the integrity of the data in the database. At this point, the data in the database exists at the declaration level. The information that exists at the declaration level provides different information in the form of raw data elements. From the statement level data, the next processing step, represented by block 120, creates new files (for example, primary file 1, and primary file 2) by reformatting the information within a format of declaration level. Before this happens, a set of events (for example, visit to the doctor for depression) relevant to depression are defined, using a combination of both the raw data elements available from the information of statements, and the clinical knowledge about of depression. With these defined events, the declaration information level is used to create new files based on the events, rather than on the declarations. Having the information in a declaration level format is an important aspect of the present invention, in the sense that, among other things, this allows for added flexibility in subsequent analyzes. As represented in block 122, additional processing is performed on the event level data to generate an analysis file. In particular, the processing is performed using input information representative of a sliding time window, and a plurality of variables. The time window entry limits the time periods in which the events of the primary files are considered. That means that the time window is used to identify an analysis region and a prediction region, where the activity in the analysis region is used to predict some previously determined consequence in the prediction region. The selection of variables, both dependent and independent, for analysis, is an important step that has an impact on the accuracy of the final prediction model. Dependent variables are representative of the desired outcome (ie, a consequence of adverse health that is to be predicted); while the independent variables are representative of the prediction factors. This processing step, step 122, can be easily reprogrammed, by means of the input parameters, for different time window settings, as well as different modifications of the variable. The analysis file generated in this step is a member level file, which means that it is classified by member. With the analysis file at hand, a model or technique is determined to identify subgroups of high risk. That is, as represented in step 124, the analysis file is used to develop an identification technique represented by an equation that incorporates a subset of the initial variables programmed into the processing step mentioned above. The resulting subset are those variables that best reflect a correlation with adverse health consequences, consequently resulting in substantial use of health care resources (eg, funds). It should be noted that the determination of the initial and final variables is an important aspect of the present invention, since the variables can have a significant impact on the accuracy of the identification of the subgroup. The above model for identification can be developed, step 124, in different ways using statistical techniques. The technique that is used in the exemplary embodiment of the present invention to generate the model is multiple logistic regression. Figure IB is a high-level flow diagram illustrating an exemplary process of the application of the present invention. Having developed the model, as shown in Figure IA, it can be applied to the updated statement data, step 132, or to other databases of patients with depression (for example, statement information for other benefit providers), in order to identify at-risk patients diagnosed with depression, step 134, allowing different types of objective intervention, to maximize the effective distribution of health care resources. Exemplary Modality of the Invention Although the present invention is subsequently illustrated and described with respect to specific examples of a method and system for identifying patients with depression at high risk of adverse health consequences, the invention is not intended to be limited to the details that are displayed. Rather, different modifications can be made in the details, within the scope and range of equivalents of the claims and without departing from the spirit of the invention. As mentioned, the present invention is designed to identify patients with depression at high risk of adverse health consequences. Being the identification of this subgroup of high risk the first step to be able to treat different treatment techniques (for example, target interventions). Initially, a source of information that allows the identification of a population of patients with depression is required. There is a comprehensive source that contains this type of information in the records of health care statements of many benefit providers. As is known, the declarations of drugs, doctors and hospitals are received and processed for payment / compensation. In the exemplary embodiment of the present invention, this declaration information is entered into a DB2 database on the benefit provider's computer system (not shown). Figure 2 is a high-level block diagram illustrating three exemplary information sources suitable for use with the present invention. As illustrated in Figure 2, the declaration information of such a provider would typically include three sources; Pharmacy statements (Rx) 210, doctor's statements (DR) 212, and hospital statements (HL) 214. As listed in the blocks that represent the information of statements, many types of information would be available, from the respective declarations, including drug codes, names of doctors, diagnostic codes, procedures, different appointments and other important information. Many of this information is referenced using codes, such as drug codes, procedure codes and disease codes. Appendices I-VI provide lists of different codes that are used with the present invention. These codes were selected for the purpose of processing the present invention from a voluminous source of codes and, as will be appreciated by those skilled in the art, can be modified to include / exclude codes considered more / less useful in the different stages of processing. The DB2 database represents a source of "raw" data elements that require processing. A first step in processing this raw data is to perform data integrity checks (for example, rejected or reconciled statements). Subsequently, the data is routinely downloaded into a "research" database. The research database is a declarations level database in SAS format. Figure 3 shows the exemplary formats for each of the pharmacy, doctor and hospital declarations of the records contained in the research database. As shown in Figure 3, the statements are listed from statement 1 to statement x, and appropriate information is also presented for the particular service provider (eg, pharmacy). Once in SAS format, the SAS procedures process the information to 1) extract patients with depression (step 116), 2) process the declaration level information into event level information (step 120), 3) using previously determined variables and time frame schemes, generate analysis files for analysis purposes (step 122) and 4) create a prediction model as a function of those variables that better reflect the correlation with an adverse health consequence (step 124). It should be mentioned that, from a statistical perspective, an important consideration when developing prediction models from data sets is the sample size. To maximize the integrity of the prediction model, the sample size is an important factor. It is reported that the frequent occurrence of depression is about 5 percent, however, the sample sizes that are required to determine prediction equations depend on the magnitude of association between variables. Since these associations are unknown, all patients are initially included within any individual plane. The first step, extracting patients with depression (step 116), uses different parameters to define which patients qualify for the initial global universe of patients with depression that will be considered. For example, in the exemplary embodiment of the present invention, only patients who have a continuous record with the provider of benefits of 12 months or more, and who have a statement for depression or treatment with an antidepressant medication are eligible. Of course, these criteria are exemplary and can be modified, such as 24 months or 6 months of registration being satisfactory, or that an individual must be 18 years of age. In the exemplary embodiment of the present invention, the statement extraction step, step 116, extracts all patient declaration data either with an appropriate code for depression (see Appendix I) or for treatment with an antidepressant drug (see Appendix III). It should be noted that in the health care industry various codes are used in the declaration information to indicate which procedures, treatments, diagnostics, drugs, etc., are being declared. For the exemplary embodiment of the present invention, the selected codes are shown in Appendices I-VI. These codes were found in the Physician's Current Procedural Terminology (CPT), American Medical Association (1995) and the St. Anthony's ICD-9-CM Code Book (Book of Medicine). ICD-9-CM Codes of San Antonio) (1994), both incorporated herein by reference for their teaching of codes and code sources. As will be appreciated by those skilled in the art, any set of codes representative of the different procedures, treatments, diagnostics, drugs, etc., relevant for use with the present invention would suffice. Throughout this specification there is reference to such codes.

After the declaration extraction step, the declaration adjustment and the integrity checks are optionally performed, step 118. To do this, from the set of data defined above, intermediate output files containing sets of frequency counts are generated. for processing purposes. In the exemplary embodiment of the present invention, intermediate output files are generated to review the following characteristics: a. frequency counts of unique members by sex, age groups (0-9, 10-19 ...) and duration of registration in months, including: i) Tables showing the count of members by sex, ii) Tables showing the counting of members within age groups, iii) age groups of age groups, classified by sex, iv) tables of duration of registration in months, that is, from 1 month to a maximum possible number of months. b. frequency counts of the ICD codes for depression (Appendix I), that is, number of members that have at least one match with each of the ICD codes at any level of Appendix I ii) as the first code. c. frequency counts of antidepressant drugs (Appendix II): i) number of members who have at least one declaration for each of the drugs in Appendix III. d. counting of members that become eligible for processing due only to the ICD code, only for the drug, and both by the ICD code and by the drug. and. frequency counts of numbers of all statements within each file (hospital, doctor, pharmacy) per member. F. frequency counts of ICD codes (use only the first 3 digits of the ICD codes) of any nature in the files of doctor (any position) and hospital - at least the top 10 with the frequency of each, that is, 2 tables , each one for doctor and hospital files. g. hospitalization frequency counts per calendar month. Counting the calendar month backward from the last month of eligibility or data availability. The last month for which the data are available will be month 1, the second-to-last month will be month 2, and so on. h. frequency counts of procedures related to depression (CPT codes, Appendix II). i. frequency counts of all CPT codes (at the level of the first 3 digits) - at least the top 10.

The above frequency counts to be used in conducting preliminary evaluations regarding the integrity of the data are exemplary, and can be modified to include / exclude parameters that are shown to be more / less useful. With this information, a "quality check" is made in the initial universe of patients with depression, to ensure that the final results, that is, the prediction model, is not unreasonably distorted due to spurious input information. This processing step, block 118, using intermediate output files, allows a refinement of the information extracted, such as that all statements are from individuals over 60 years of age, that all statements are from men, or other imbalances of data that would otherwise corrupt the integrity of a prediction model. Step 118, in the exemplary mode, is performed manually by viewing the intermediate output files. It is contemplated, however, that by using different threshold values, the frequency counts can be automatically tracked to see a potential unbalance. Having now extracted and refined the level information of declarations of conformity with different previously determined criteria, considered relevant for purposes of subsequent processing, the information is converted to an event-level format. To provide processing flexibility, particularly when allocating time windows for analysis, the second step mentioned above (ie, converting declarations level information into event level information, step 122) is employed to generate two primary data files from which an analysis file can be created. In the exemplary embodiment of the present invention, the primary data file 1 is a member level file, and it contains all the data of a static nature (ie, not time sensitive) such as 1) Member's key, 2 ) Date of birth, 3) Sex, 4) First available date of registration (ie start of the data set (1/1/92) or date of registration), 5) Final date of registration (ie end of year) data set or last record date), 6) Date of the first event of depression (first prescription of antidepressant or hospitalization for depression), 7) Date of last hospitalization, 8) Number of records in the event file (primary file 2), and 9) Mode of entry into the data set (eg, i) Only by antidepressant drug, ii) Only by diagnosis of depression, iii) both by antidepressant drug, and by diagnosis of depression). The primary data file 2 is an event level file with a record for each event ordered by the member, and the chronological date of the event, in the present invention, presented in descending order of the date of the event. It should be noted that an event, which is sometimes referred to as an episode, is an occurrence that, based on clinical knowledge, is considered relevant for depression. Having knowledge of what raw data elements are available from the declarations, a set of events is directly or indirectly defined from 'the data elements where the events can be based on an individual data item, a combination of data elements, or they can be derived from individual or multiple data elements. Figure 4 is an exemplary list of events and format for the primary file 2 (an event level file). As shown in Figure 4, the inputs provided include: 1. Hospitalization for depression a. Any hospital declaration identified by the hospital site code. b. Having a duration from and to at least 1 day. c. Having the ICD code 9. d. ICD code 9 of depression occurring in any position. and. Disease indicator (Appendix V) 1 = major disease, 2 = suicide, 3 = major illness and suicide, 0 = anything else. 2. Emergency room due to depression a. Visit to the emergency room, identified by the room code of the emergency room. b. Having the ICD code 9 (see Appendix I). 3. Visit to the doctor (not the hospital) for depression a. Any doctor's statement. b. Having the ICD code 9 (see Appendix I). c. Category: Psychiatrist = 1, any other = 0. 4. Prescription by SSRl a. Therapeutic class SSR1 (selective serotonin reuptake inhibitors) 5.51.3 b. Cost = 0 if it is generated by admission to the hospital. c. Category indicator = blank 5. Prescription for TCA (tricyclic antidepressants) or MAOI (Monoamine Oxidase Inhibitors) a. Therapeutic classes 5.5.1.1 (tertiary amines), 5.5.1.2 (secondary amines), 5.5.1.4 (Monoamine Oxidase Inhibitors), and 5.5.2 b. Cost = 0 if it is generated by admission to the hospital. c. Category indicator = therapeutic class 1 = 5.5.1.1, 2 = 5.5.1.2, 3 = 5.5.1.4, 4 = 5.5.2. 6. Prescription for another neuroactive drug (From the pharmacy file). 7. Procedure for depression (from doctor's files, hospital) Category: CPT codes or ICD procedure 0 = Psychotherapy All CPT and ICD codes in Appendix II are not listed below. Diagnosis 1 = 90801, 90820, 90825, 90830, 90862 94.0x, 94 lx, 94.21, 99.22, 94.23 2 = Shock Therapy 890,870, 908,712 94.24, 94.26, 94.27 For this entry, the costs are assigned to the doctor's visit or hospitalization in which the procedure occurred. 8. Hospitalization, not depression. It should be noted that the issues under entry 8 could have been made for a condition other than depression, although these patients enter the group by virtue of having received a diagnosis for depression or having received an antidepressant at some point, making it likely that these procedures were for depression. to. All hospitalization having dates from and to at least one day. b. ICD 9 codes for major disease (see Appendix V). c. Category as in previous 1 (1 = greater, 2 = suicide, 3 = both, 0 = any others). The counts for entries 9-13 are added for each month. The date is the date for the first occurrence of the identified events. In the number field, the number of identified events occurring in that month is added. 9. Emergency room not for depression. to. Visit to the emergency room identified by the Emergency room. 10. Visit to the doctor (outpatient) not for depression a. Any visit to the doctor. b. Excluding a visit with a diagnosis of depression (Appendix I), that is, not in 3 above. 11. Prescription for the possibility of related drugs Drugs identified in Appendix IV 12. Prescription for any other drugs (not for depression) All drugs included in Appendices III or IV. 13. Procedure not for depression (from doctor and pharmacy files) a. Category 1 indicator = major procedures, 2 = minor procedure (see Appendix IV). After the generation of the two primary files using the instructions described above, corresponding to step 120 of Figure 1, further processing is done in the event level data, to generate an analysis file, step 122. In Figure 5 An exemplary format is shown for the analysis file. As shown, the analysis file format includes a list of members in the first column of a table. Through the upper part of the table is a list of variables, described in detail later. And, the body of the table provides indications as to the relationship of the member with a variable listed. In particular, the processing of the primary files to the analysis files includes an algorithm defined, in part, by a time window and a plurality of variables. The algorithm can be reprogrammed for different adjustments to the time window, as well as modifications to the variable. The analysis file generated in this step is a member level file (that is, organized with respect to the members). The main analysis files are in member-level files derived from the imation in the primary files. Each main analysis file is created to take into account a single reference time window of censored events and a prediction window of interest for that file. Each new time window applied to the data, in the exemplary mode, requires another main analysis file. To generate the analysis file, a time window scheme, together with a plurality of variables, is applied to the event level data. Discussing the variables first, included in the processing, are both independent and dependent variables. The independent variables basically represent potential predictors of adverse health consequences; while the dependent variables basically represent the consequence of adverse health that is going to be predicted. To determine the exemplary independent variables for step 122, as many original data elements as possible are used, without assuming anything about the depression. Then, based on clinical knowledge, additional variables are created. On the other hand, combinations of the data elements and / or variables are used as variables, based on clinical knowledge. Finally, some variables can be created and used based on their potential usefulness as a point of influence in the management of the disease. It should be noted that, for the purpose of making a cost hierarchy, the following rules were used in the exemplary embodiment of the present invention. 1. Only hospitalizations for depression can generate other events. 2. Hospital costs include all pharmacy, procedure, medical charges. 3. Hospital visits can generate pharmacy events and procedures, with costs set to zero (included in the cost of the hospital). 4. Hospital visits can not generate separate doctor visit events. In the exemplary embodiment of the present invention, the plurality of variables currently used in step 122 in the SAS routine for generation of an analysis file from an event level file are shown below in the Table 1. In Table 1, although the abbreviations should be self-evident, by way of example, some abbreviations are as follows: "DEP" means depression, "HL" means hospitalization, "#" means number, "MOS" means months, "OTH" means others, "ER" means emergency room, "RX" means prescription, "SUP" means supply, "PROCS" means procedure, and "TOT" means total. Table 1 1. "DEPENDENT CODE" 2. "HL INDICATOR FOR DEPRESSION" 3. "MOS # AVAILABLE FOR ANALYSIS" 4. "AGE AT THE TIME OF COURT" 5. "FEMALE INDICATOR" 6. "TOTAL COST DURING PERIOD OF ANALYSIS "7." # OF DRUG CLASS CHANGES FOR DEPRESSION "8." SUPPLY OF DRUG DAYS FOR DEPRESSION "9." # OF HLS PER DEP "10." # OF HLS OTH DEP "11." # OF HLS BY DEP AND MAJOR ILLNESS "12." # OF HLS BY DEP AND SUICIDE "13." # OF HLS BY DEP AND LARGEST DISEASE AND SUICIDE "14." # OF HLS BY DEP AND CODE RELATED TO DEP " . "# OF DURATION OF STAY IN HL BY DEP" 16. "# OF VISITS IN ER BY DEP" 17. "# OF VISITS TO THE DR BY DEP" 18. "# OF VISITS TO THE DR / SIQUIATRA" 19. "# DE RX BY SSRl "20." # DAYS SUP OF SSRl "21." # OF RX BY TCA "22." # RX TCA: TERTIARY "23." # RX TCA: SECONDARY "24." # RX TCA: OXY MONO INHIBITORS "25." # RX TCA: ANY OTHER TYPE "26." # DAYS SUP OF TCA "27." # DAYS SUP OF NEUROACTIVE (NA) "28." # DAYS SUP OF NA: ANSIOLÍTICOS Y SEDATIVOS " 29. "# SUPREME NEUROACTIVE DAYS: ANY OTHER" 30. "# OF PROCS FOR DEPRESSION" 31. "# OF DEPRESSION SYS THERAPY PROCESS" 32. "# DE PROCS DE DIAG DEP" 33. "# OF PROCS OF CHOKE THERAPY FOR DEPRESSION " 34. "# OF HOSPITALIZATIONS BY OTH". 35. "# OF HLS OTH BY ANY OTH" 36. "# OF HLS OTH AND MAJOR DISEASE" 37. "# OF HLS OTH AND SUICIDE" 38. "# OF HLS OTH AND SICKEST AND SUICIDE DISEASE" 39. "# OF HLS OTH AND CODE RELATED TO DEP" 40. "# OF DURATION OF STAY IN HL OTH" 41. "# OF ERS OTH" 42. "# OF VISITS TO DR OTH" 43. "# OF RX FOR RELATED DRUGS "25 44." # OF RX SUB DAYS FOR RELATED DRUGS "45." # OF RX FOR ANY OTHER "46." # OF RX SUB DAYS FOR ANY OTHER "47." # OF PROX NOT FOR DEP "48 . "# PROX FOR MAJOR ILLNESS" 5 49. "# PROX FOR MINOR DISEASE" 50. "% COST OF HL PER COST DEP OF TOT" 51. "% OF COST OF ER BY DEP OF COST TOT" 52. "% COST OF SSRl PER DEP OF COST TOT" 53. "% OF TCA COST PER DEP OF COST TOT" 10 54. "% OF COST OF NEUROACT PER DEP OF COST TOT" 55. "% OF COST OF HL OTH OF TOT COST "56." COS COST OF TOTAL OTH COST "57." DR COS COST OF TOT COST "58." COST OF RX RELATED TO COST TOT OTH "15 59." COST OF RX FOR ANY OTHER COS COST TOT "60." # COST OF EVENTS RELATED TO DEP " '61. "# COST OF EVENTS RELATED TO OTH" 62. "# COST OF DRUGS BY DEP OF ALL DRUG COSTS 20" 63. "# COST OF DRUGS BY OTH OF ALL DRUG COSTS" 64. "# COST OF DRUGS BY DEP OF ALL COSTS 65. "# COST OF DRUGS BY OTH OF ALL COSTS 25 66." # COST OF DRUG COSTS BY DEP COST "67."% TCA COST OF DRUG COSTS BY DEP "68."% OF COST OF NEUROACTIVE DRUG COSTS BY DEP "69." INDICATOR OF HL BY DEP IN THE LAST 12 MONTHS " 70. "ER BY DEP INDICATOR IN THE LAST 12 MONTHS" 71. "MOS BETWEEN FIRST AND LAST EVENT" 72. "MOS SINCE DEP FIRST EVENT" 73. "MOS SINCE THE LAST FIRST EVENT OF DEP" 74. "MOS OF DATA USED FOR ANALYSIS" 75. "MEASUREMENT OF RX COMPLIANCE BY DEP "76." # OF HL PER DEP FOR GENDER INTERACTION "77." # DE ER BY DEP FOR GENDER INTERACTION "78." # OF DR VISITS BY DEP FOR INTERACTION OF GENDER "79." # RX SSRl FOR GENDER INFORMATION "80." # RCA TCA FOR GENDER INTERACTION "81." # RX NEUROACTIVE FOR GENDER INTERACTION "82." # PROCS FOR DEP POR GENDER INTERACTION " 83. "# OF UNIQUE GENERAL DRS USED" 84. "# OF UNIQUE GENERAL USERS" Turning to the variable dependents, the potential dependent variables, for example, contemplated for use with the present invention, as results to be predicted, include: 1. Admission to the hospital (HL) or visit to the emergency room (ER) due to depression. This is a dichotomous variable referred to as the HL (or ER) indicator, such that HL (or ER) = 1 if an admission or visit to ER occurred, otherwise the indicator equals 0 2. 10 percent higher resource utilization measured in dollars. Resources are counted from the cost time in the top 10 percent of the first diagnosis of depression or receipt of the first antidepressant (in the registry) + 1.3 and 6 months - separate analyzes for each time period. Again, this is a dichotomous variable referred to as a High Cost indicator such that if the patient is in the top 10 percent, Guest Cost = 1, otherwise High Cost = 0. In the exemplary mode, the High Cost indicator could also be defined as the distribution of total cost per member (PMPM) in the prediction region (B to C). ) is used to define this variable. The High Cost indicator is set to 1 for 10 percent of members with the highest total cost per member in the Total Cost distribution and is set to 0 for all others. 3. Any admission to the hospital for attempted suicide - identified by the statement that relates to any of the ICD 9 300.0 or 800-999 codes. As those of ordinary skill in the art will appreciate, the use of suicide attempt as a dependent variable can only provide useful results if there is a sufficient number of occurrences to do so. Although only three dependent variables are listed above, as will be appreciated by those of ordinary skill in the art, other variables known or not yet known within the scope of the present invention may also suitably serve as dependent variables. Returning to the time window aspect of generating analysis files, it should be noted that there is an analysis record for each selected member. In the present invention, different schemes have been developed, as described below, to define the prediction zones and the censorship data to create the analysis file. That is, a time window basically defines a prediction zone or region and a region of analysis where the analysis region where the activity is used to predict something in the prediction zone. They may also suitably serve the present invention, additional time window schemes. For explanatory purposes, reference is made to the time it covers the history of the statements such as the time window that starts at point 'A1 and ends at point' C. The time interval is divided into regions of analysis and prediction by point 'B' such as A <; B < C. As an example, Jane's analysis record Doe is based on statements from 1/1/91 to 6/30/93. Therefore, A = l / 1/91, C = 6/30/93 and B can be selected somewhere between the two, such as 12/31/92. Generally, A is defined based on the data extraction protocol (that is, where the data are available from) and C is defined by the last day on which the member is still registered and is eligible for benefits. Of course, variations of those general definition points may be selected within the scope of the present invention. The definition of B is important. In the present invention, two basic definitions of B were drawn to maximize the accuracy of the prediction model. However, as will be understood by those skilled in the art, alternative definitions of B are contemplated. Figure 6A is a first exemplary time window scheme, referred to as Scheme 1, for use in the data processing of event level files that are shown in Figure 4. In Scheme 1, the event prediction region is set from B to C in such a way that B = C- (x # of months) for all members in the analysis . For example, if a 6-month hospitalization for depression (HL) is used (that is, hospitalization for depression is used as a dependent variable) then B = C- (6 months) should be constructed. In Jane Doe's example, B would equal 12/31/92. Therefore, only data ranging from A to B (1/1 / 91-12 / 31/92) are used to predict hospitalization for depression in the 'following 6 months'. The phrase 'following 6 months' in this context implies that time point B is "NOW" and any other time after this is in the FUTURE and any time before that is in the PAST. This is a key concept in Scheme 1 and it is important to understand the implementation and application of the prediction model. As an additional explanation, when a variable is defined as '# of psychotherapeutic visits in the LAST 6 Months', this means that the count for this variable is based on the statements of [(B-6 months) to B] for all members in the analysis. It should be noted, however, that point B may vary with each member in the analysis population. An alternative is illustrated for Scheme 1, and this is referred to as Scheme 2, in Figure 6B which shows a second exemplary time window scheme for use in data processing of the event level files that are shown in Figure 4. A difference between Scheme 1 and Scheme 2 is the definition of the prediction region for members who have at least one hospitalization or visit to the emergency room for depression (HL / ER). The prediction region that starts at point B, in Scheme 2, is defined in multiple passes on the record of each member. Returning again to Jane Doe's analysis log (from 1/1/91 to 6/30/93, A = l / 1/91, C = 6/30/93) to illustrate how this aspect works to define point B , suppose that Jane Doe was hospitalized for depression three times: on 1/4/91, 1/4/92 and 1/4/93. Point B is set equal to the date of the first hospitalization or visit to the emergency room for depression - 1 month or equals to point C if a member has never had a hospitalization or emergency room visit for depression in their statements histories. For Jane Doe, B = 4 / l / 91. In the exemplary embodiment of the present invention, the regression of one month from the date of hospitalization or visit to the emergency room by depression is performed to simulate the model application environment. There is probably at least 30 days of interval from the model registration to disease management actions based on the registration reports. In this way, in Jane Doe's record B = 4 / l / 91- (l month) = 2/28/91. Jane's record in this case will not be used in the construction of the model because the time span of the analysis region is only two months-less than the exemplary six-month data history requirement. Repeating steps 1 and 2 using the second (or third or ...) date of hospitalization or emergency room visit for depression to establish point B, Jane Doe's record would eventually be a model construction in the second and third pass. This process, in the exemplary modality, ends after three or four passes since there would probably be very few members with five or more hospitalizations or visits to the emergency room due to depression in the study population. It should be noted that the consequence of repeated modeling introduces the added complexity of establishing additional independent variables. An important advantage, however, of Scheme 2 is that the prediction of the average number of hospitalizations or visits to the emergency room for depression would probably be greater than in Scheme 1. In yet another alternative modality, weights of analysis may be used which they reflect proximity to the event that is to be predicted, for example, within 3 months x 1, 3-6 months x .75, 6-9 months x .5, 9-12 months x .25, greater than 12 months x. 125 Other suitable weighing techniques could be used, as will be appreciated by those skilled in the art. These types of weighing techniques can be used with either Scheme 1 or Scheme 2. Therefore, given a selected time window scheme and an appropriate placement of previously determined variables, processing step 122 generates the file of analysis. Using the analysis file, the model for identification / prediction can then be developed in various ways using statistical techniques. In particular, the analysis file is processed, now at a member level, using statistical functions available in SAS. In the exemplary embodiment of the present invention, the statistical processing performed to generate the prediction model is multiple logistic regression. As will be appreciated by those skilled in the art, other statistical techniques for use with the present invention may also be suitable. In the exemplary mode, statistical processing, when applied to the analysis file, identifies variables that meet previously determined levels (for example, probability value). < 0.05). Then these variables form a prediction model which is a mathematical equation in the following way: Logit (p) = a + bxl + cx2 ... + zxi where XI ... xi are the variables identified and a ... z are your parameter estimates. Then the individual probability (p) for the result under consideration is determined using the following formula: p = e-logit (p) / (1 + e-logi-t (p)). Figure 7 shows the experimental results for a model based on Scheme 1 and using the indicator of hospitalizations or visits to the emergency room for depression as a dependent variable. The resulting independent variables selected for the prediction model include "FEMALE INDICATOR", "# DEP / HLS", "# DEP ERS" "# DR VISITS / SIQUIATRA", "#DEP PROCS", and "OTH CODE HLS AND RELATED DEP " Figure 7B shows the experimental results, including the dependent variables for a model also based on Scheme 1 but using, as a dependent variable, the High Cost indicator. It should be noted that, although both experimental results indicate that six independent variables were used for the prediction model, more or less independent variables could be used based on their individual capacity to accurately predict the selected dependent variable. Then, the given model is applied to the data. That is, because the prediction zone of the previous processing was actually based on past data for analysis purposes, the model is now applied to the data in such a way that a prediction zone is defined in the future. The given model can be applied to existing data, to data as it is regularly updated or to other databases of statements for other benefit providers. To do this, you only need to process the independent variables of interest. Of course, as new databases of the declarations must be analyzed, the whole process can be repeated to generate a new model to determine if other variables can be better predictors. The result generated by the application of the model is a file that contains a list of all patients with depression and a representative indicator of the possibility that the patient will have an adverse health outcome (that is, experience that is defined by the dependent variable) . This list can then be divided into subgroups such as in increments of 5 percent or 10 percent of patients with the possibility of having an adverse health outcome. By applying the model to data from future statements and other databases of patients with depression or when constructing a new model in a new database as described above, patients of depression at risk can be identified allowing various types of intervention for maximize the effective allocation of health care resources for patients with depression. This intervention can take the form of 1) specific case management, 2) novel interventions based on the characteristics of the subgroup, 3) high-risk intervention, 4) high (relative) cost intervention, or 5) plan modification all adhering , of course, to the best practice guides. Appendices I-VI follow Appendix I ICD-9-CM Depression Codes Appendix II CPT-4 or ICD-9-CM Procedure Codes for Psychotherapy ICD-9CM Procedure Codes: 9426 Subconvulsive electro-shock therapy 9427 Other electro-shock therapy 943-9439 Individual psychotherapy 44-9449 Psychotherapy and counseling CPT-4 codes: 908xx All codes of psychiatric procedure 90841-90844 Individual medical psychotherapy 90846-90849 Family medical psychotherapy 90855 Individual interactive psychotherapy 90857 Group interactive psychotherapy 90870-90871 Electroconvulsive therapy Appendix III Antidepressant Agents (From the national DPS form, 1995) Therapeutic class 5.5.1.1 (Tertiary amines) amitriptyline doxepin and ipramin trimipramine clomipramine Therapeutic class 5.5.1.2 (Secondary amines) desipramine nortriptyline to oxapine protriptyline Therapeutic class 5.5.1.3 (Selective Serotonin Reuptake Inhibitors) paroxetine sertraline fluoxetine Therapeutic class 5.5.1.4 (Other Antidepressants) amitriptyline / perphenazine trazodone burppropion venlafaxine Therapeutic Class 5.5.2 (Monoamine Oxidase Inhibitors) isocarboxacide phenelzine Tranylcypromine Appendix Illa Neuroactive drugs that are not for depression all 5.x codes that are not in the previous appendix Appendix IV Drugs that may be used excessively by patients with severe depression Form DPS 1995 Possible related drugs 51. Analgesics and other medications for headache 9. 1 Antacids 9.2 Antidiarrheals 9.3 Antispasmodics 9.4 Anti-ulcers' 9.5 Laxatives 9.6 Other Gl 11.1.1 Salicylates 11.1.2 Non-steroidal anti-inflammatory drugs 11.3.1 Direct muscle relaxants 11.4 Other muscle relaxants 12.1.2 Multivitamins, fluorides, B2, Folic acid, therapeutic vitamins 13.1 .1 Prenatal vitamins 13.7 Oral contraceptives 15.2.1 Antihistamines 15.2.2 Decongestants 15.2.3 Combination of antihistamines / decongestants 15.3 Antitussives and expectorants Appendix V Diagnosis of major disease Code ICD-9 Neoplasm (any site, any type) 140-239 Ischemic heart disease (any form) 410-414 Cardiopulmonary disease 415 ^ 17 Cardiac insufficiency 428 Cerebrovascular disease 430-438 Chronic opiative lung disease 490-496 Non-infectious enteritis and colitis 555-558 Nephritis, nephrotic syndrome and nephrosis 580-589 Normal shipping and other indications for care ... 650-659 Injury and Poisoning 800-999 Suicide risk 300.9 Attempt to commit suicide by drug E9502 -E952 Appendix VI Major procedures These will be considered essentially like any surgical procedure CPT Code 10040-69979 Minor procedures These are multiple screening tests and drug screening CPT codes 80002-80103

Claims

1. A method implemented by computer to generate a model, to identify at risk patients diagnosed with depression, information about existing patients in a database of declarations, said method comprising the steps of: processing, based on previously determined criteria, the information of the patient in the declarations database, to extract information from the statements for a group of patients with depression; define, using the information available in the declarations database, a set of events relevant to depression; create, using extracted statement information and defined events, files that contain event level information; define a time window to provide a time frame, from which to judge whether events in subsequent processing should be considered; define a set of variables as potential predictors; process the event level information, using the time window and the set of variables, to generate an analysis file; and perform statistical analyzes on the analysis file, to generate a prediction model to be used in the identification of at-risk patients diagnosed with depression, said prediction model being a function of a subset of the set of variables.

2. A method implemented by computer to identify at-risk patients diagnosed with depression, information about existing patients in a database of statements, said method comprising the steps of: processing, based on previously determined criteria, the patient's information in the database of statements, to find and extract information from the statements for a group of patients with depression; define, using the information available in the declarations database, a set of events relevant to depression; process extracted statement information and defined events to create files that contain event level information; define a time window to provide a time frame, from which to judge whether events in subsequent processing should be considered; define a set of variables as potential predictors; process the event level information, using the time window and the set of variables, to generate an analysis file; perform statistical analysis on the analysis file, to generate a prediction model, said prediction model being a function of a subset of the set of variables; and apply the prediction model to a processed statement database to identify and output a file listing the probability of each patient having an adverse health consequence.

3. The computer-implemented method of claim 1, characterized in that the step. of processing extracts patients who have been diagnosed with depression or prescribed an antidepressant drug.

4. The computer-implemented method of claim 1, characterized in that the step of defining a set of variables includes defining both dependent and independent variables, and a hospital indicator (HL) is defined as a dependent variable, wherein the variables independent are representative of predictive factors and the dependent variable is representative of an adverse health consequence.

5. The computer implemented method of claim 1, characterized in that the step of defining a set of variables includes defining both dependent and independent variables, and a high cost indicator is defined as a dependent variable, wherein the independent variables are representative of predictive factors and the dependent variable is representative of an adverse health consequence. The computer implemented method of claim 1, characterized in that the step of defining a set of variables includes defining the dependent and independent variables, substantially all the data elements of the declaration information, as well as at least one combination of Data elements are used as independent variables. 7. The computer-implemented method of claim 1, characterized in that the step of performing • Statistical analysis includes carrying out a logistic regression. 8. An apparatus for generating a model for identifying patients at risk with depression, information about existing patients in a declarations database, said apparatus comprising: elements for processing, using previously determined criteria, the patient's information on the basis of statement data, to find and extract information from statements for a group of patients with depression; a set of events previously determined, derived from the information of declarations, said events being relevant for depression; elements, using the extracted statement information and the set of events, to create event-level information files; a window of time previously determined to provide a time frame from which to judge whether events in subsequent processing should be considered; a set of previously determined variables representing potential prediction factors; elements, using the time window and the set of variables, to process the event level information to generate an analysis file; and elements to perform statistical analyzes on the analysis file, to generate a prediction model that is used to identify at-risk patients diagnosed with depression, said prediction model being a function of a subset of the set of variables. The apparatus of claim 8, characterized in that it also comprises: elements for applying the prediction model to a processed declarations database to identify and output a probability, for each patient, of having an adverse health consequence. 10. A computer-readable medium containing a program to generate a model, to identify at-risk patients diagnosed with depression, from a database of statements containing information about patients, said program in said medium comprising: elements to cause that a computer process, based on previously determined criteria, the patient's information in the declarations database, to extract information from the statements for a group of patients with depression; elements to cause the computer to enter a set of events determined previously relevant to depression; elements to cause the computer to create, using the extracted statement information and defined elements, files containing event level information; elements to cause the computer to establish a time window to provide a time frame, from which to judge whether events in subsequent processing should be considered; elements to cause the computer to enter a set of previously determined variables, representative of potential predictors; elements to cause the computer to process the element level information, using the time window and the set of input variables, to generate an analysis file; and elements to cause the computer to perform statistical analyzes on the analysis file, to generate a prediction model that is used to identify at-risk patients diagnosed with depression said prediction model being a function of a subset of the set of variables. SUMMARY A computer-implemented technique, including database processing, is used to identify at-risk stocks in a declarations database. The technique includes processing patient information in the declarations database to find and extract information from statements for a group of patients with depression. Then, using the information extracted, a set of events relevant to the depression is defined. Then, the extracted information and the set of events are processed to create event-level information, which is organized with respect to events rather than statements. A time window is defined to provide a time frame from which to judge whether events in subsequent processing should be considered; and, a set of variables is defined as being potential predictors of adverse health consequences. Subsequently, the event level information is processed, using the time window and the set of variables, to generate an analysis file. Statistical analyzes, such as logistic regression, are performed on the analysis file to generate a prediction model, where the prediction model is a function of a subset of the set of variables. Finally, the prediction model is an exit of patients at risk, diagnosed with depression, which is likely to have adverse health consequences.