WO1998020428A1

WO1998020428A1 - Interactive and automatic processing of text to identify language bias

Info

Publication number: WO1998020428A1
Application number: PCT/US1997/019912
Authority: WO
Inventors: Linda M. Bland; David Wittmann
Original assignee: Bland Linda M; David Wittmann
Priority date: 1996-11-01
Filing date: 1997-10-31
Publication date: 1998-05-14

Abstract

A system (10) that interactively and automatically processes text to identify language bias has a text input (20), a central processing unit (CPU) (30), a memory (40) for storing bias codes, a memory (50) for storing words and phrases related to bias, and output apparatus (60) for presenting text output to a user. A computer-readable storage medium device (25) contains databases including a first database (26) of words expressing bias and a second database (27) including at least one alternative expression for each of the words expressing bias. The first database (26) has a bias-type code associated with each of the words expressing bias. The computer-readable storage medium may also include a third database (28) including help messages and/or training messages for language-bias education. Each of the provided computer-readable storage medium devices may be limited to one bias type or may include several bias types in the same device.

Description

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE AS RECEIVING OFFICE

TITLE OF THE INVENTION INTERACTIVE AND AUTOMATIC PROCESSING OF

TEXT TO IDENTIFY LANGUAGE BIAS

FIELD OF THE INVENTION This invention relates generally to computer-implemented text processing and more particularly to computer-implemented interactive processing of text including words and phrases from selected groups considered representative of various biases in writing and speech.

BACKGROUND OF THE INVENTION In every written language, certain words and phrases that evoke responses in a reader may be considered biased in some cultural context. Typical of such biases are racial, religious, gender, age, ethnic, sexual orientation, political, disability, and occupational biases. In English writing, for example, one may encounter text including particular words considered derogatory toward a particular racial or religious group. Or, with respect to gender bias for another example, one may encounter text implying that a nurse is female, although it is known that not all nurses are female. Many writers and editors now use text-editing or so-called word-processing software implemented on computers or dedicated word processors. Computer-implemented methods and apparatus that help a writer to avoid inadvertently incorporating biased text in a written text will be useful to many writers and editors. This will be true for writing or editing text in any language. While it is often considered desirable to avoid bias in writing, there are also situations where a writer or editor may wish to intentionally introduce bias, as in argumentative text, satire, or fiction. An example of such a situation is a fictional text in which the author wants to characterize a character as exhibiting a certain type of bias. Language evolves over time, and cultural mores evolve over time, so that fixed, inflexible solutions to textual bias problems are generally expected to be unsatisfactory. Thus from several points of view, programmable computer-implemented processing of text with respect to biased terminology is useful, and interactive processing methods are particularly useful. Such processing can be especially important in situations where written expressions of bias may be involved in violation of laws governing libel or discrimination. Speech writers can use such processing of the text of a speech, to avoid language that might alienate some members of an audience. More commonly perhaps, the mitigation of bias in texts can be important to advertisers and publishers for ensuring that the texts published will appeal to a broad, diverse readership.

The following list identifies some books and dictionaries in English related to bias in writing, the disclosure of each in its entirety being incorporated herein by reference:

Francine Frank, "Language and the Sexes" (State University of New York Press, Albany, NY, 1983).

Rosalie Maggio, "The Nonsexist Word Finder: A Dictionary of Gender-Free Usage" (Beacon Press, Boston, MA, 1988).

Casey Miller and Kate Swift, "The Handbook of Nonsexist Writing" 2nd Edition (Harper & Row Publishers, New York, 1988).

Francine Frank et al., "Language, Gender, and Professional Writing: Theoretical Approaches and Guidelines for Nonsexist Usage" (Modern Language Association of America, New York, 1989). Val Dumond, "The Elements of Nonsexist Usage" (Simon & Schuster, Inc., New York, 1990).

Rosalie Maggio, "The Bias-Free Word Finder" (Beacon Press, Boston 1991).

The Research and Training Center on Independent Living, "Guidelines for Reporting and Writing about People with Disabilities" Fourth Edition (Research and

Training Center on Independent Living, Lawrence, Kansas, 1993).

American Psychiatric Association, "Mental Illness Awareness Guide [for the] Media" (American Psychiatric Association Division of Public Affairs, Washington, D.C. 1994).

Sara Mills, "Feminist Stylistics" (Routledge, New York, 1995).

Marilyn Schwartz and The Task Force on Bias-Free Language of the Association of American University Presses, "Guidelines for Bias-Free Writing" (Indiana University Press, Bloomington, Indiana, 1995).

The references listed above are written in English and are directed to some extent toward the treatment of texts in English (the native language of the present inventors). The invention described herein may be suitably applied to any written language, natural or artificial, and the citation of these English texts does not imply any intent by the present inventors to diminish the importance of any other language, or to limit application of the invention to only English language texts. NOMENCLATURE

The terms "word" and "words" as used throughout this specification and the appended claims are meant to include a phrase, i.e., a group of words used together. To the extent that bias is sometimes expressed by a suffix appended to an otherwise bias-neutral word, such suffixes are also encompassed within the intended meaning for the terms "word" and "words." In the context of this specification, "bias" is used to mean the presence of discriminatory, prejudiced, disparaging, stereotyping, or potentially offensive terminology in text. A "bias type" means a particular type of textual bias, such as religious bias, racial bias, etc. expressed in the text. A "bias type code" means a symbol assigned to a particular bias type. The term "corpus" refers to any text, including complete documents and selected portions of text.

DESCRIPTION OF THE RELATED ART From one point of view, the subject matter of this invention falls within the general field of computational linguistics. Several books have surveyed that field, including "Computers in Linguistics" by Christopher Butler (Basil Blackwell Ltd. , Oxford, UK 1985) and "Computational Linguistics - An Introduction" by Ralph Grishman (Cambridge University Press, Cambridge, UK 1986).

In the related art, many computer-implemented methods have been developed for searching a corpus of text for a selected word, and many commercially available spell-checking programs are commonly provided for use with word-processing computer programs. These are generally directed to finding words that match in spelling or to detecting spelling errors. U.S. Pat. Nos. 4,787,059; 4,797,855; and 4,859,091, for example, describe such methods. In some cases, such as computer- implemented dictionaries, encyclopedias, and databases, a computer-implemented search through text for a word or a logical combination of words leads a user to a definition, to an encyclopedia article, or to a database entry related to the meaning of the word searched, or at least to an entry that mentions the word. A survey article, Karen Kukich "Techniques for Automatically Correcting Words in Text" ACM Computing Surveys, Vol. 24, No. 4 Dec. 1992 pp. 377 - 439 reviewed research and technology of text word error correction. There is a large field of art directed to interpretation of natural-language queries to provide access to data stored in databases. U.S. Pat. No. 5,442,780 is a recent patent representative of such developments. Some word-processing software programs have so-called "macro" capabilities, which allow a user executing a macro to invoke external programs, such as a computer-implemented thesaurus or the like. Many so-called "search engines" have been developed for various text-searching purposes, such as the searching of electronic texts stored in databases accessible through computer networks. U.S. Pat. No. 5,418,948 is a recent patent related to that field. In another field of related art, computer programs have been written to assist lawyers in managing evidence, such as evidence obtained through a discovery process. Such evidence may include textual material. Modern lexicographers use computers in the preparation of dictionaries and thesauruses, e.g., to produce word counts, indices, and concordances, to detect homographs, and to aid in the organization of word-forms under appropriate headings, etc. A review of problems and challenges in natural language processing including computational lexicography was published in the book "Challenges in Natural Language Processing" (M. Bates and R. M. Weishedel, eds., published by Cambridge

University Press, Cambridge, England (1993)). The role of lexicons was reviewed in an article by Louise Guthrie et al. "The Role of Lexicons in Natural Language Processing" Communications of the ACM, Vol. 39, No. 1 January 1996 pp. 63 - 72. In yet another field of related art, literary and Biblical researchers have developed a number of statistical textual analysis methods for characterizing a corpus of text as to vocabulary, style, etc. U.S. Pat. No. 5,323,310, for example, describes a method of analyzing textual documents to compare translations. Other excellent examples of text analysis research are those works that have statistically characterized the known writings of William Shakespeare. The statistical results have been used to test texts of unknown authorship to determine if their style is consistent with Shakespeare's known writings. Related analysis is described in the article by E. Dolnick, "The Ghost's Vocabulary" (The Atlantic, Vol. 268 No. 4, Oct. 1991, pp. 82 - 86).

Yet another area of related art is interactive prompting of users with terms related to a concept. U.S. Pat. No. 5,153,830 to Fisher et al. disclosed a method and apparatus for providing assistance with respect to the development, selection, and evaluation of ideas and concepts. This was described as a computerized aid to creativity and problem solving, using an interactive database comprising two major parts. The first part is a database of several thousand questions for clarifying the task, modifying ideas, and evaluating goals, ideas, and outcomes. The second part is a database expressing the shared concepts of a particular culture, namely American, and idea associations, to which any number of a user's personal, idiosyncratic connections can be added. When the user comes up with his own associations, the Fisher et al. invention allows these associations to be added to those already present.

The related art contains many examples of uses of computers applied to natural language or textual data. Despite the extensive research and development in all of these areas of related art and despite an existing need, however, the present inventors are unaware of any related art specifically directed toward detection of textual bias; to the modification, correction, and/or removal of biased terminology from text; to interactive processing of text with respect to biased terminology; or to providing automatic and interactive aids to teaching about bias in writing.

PURPOSES, OBJECTS, AND ADVANTAGES OF THE INVENTION

A major purpose of the invention is aiding a writer or editor by identifying instances of biased terminology in a corpus of text. A subsidiary purpose is providing such aid without imposing any external or arbitrary cultural norms or standards of style. Another subsidiary purpose is providing such help in a simple system that is operable without requiring artificial intelligence methods or any other kind of analysis of the context, style, syntax, or grammar of the corpus of text. Other purposes include helping writers and editors to detect bias which may have been introduced unconsciously into text, and/or helping writers to detect bias of which the text's author may be unaware. A more particular object of the invention is a method for selectively comparing text with words and phrases organized into groups, each related to a particular type of bias. A related object is a method for selectively comparing text with a group of words and phrases characteristic of particular subject matter. Another important object is a method for offering unbiased alternative expressions to substitute for biased terminology. A related object is providing a set of databases or lexicons, of which each may be directed toward detection of a particular type of bias. Another particular object is a method for bias detection that allows a user to select one or more types of bias to be detected, while optionally disregarding other types of bias. Another related object is providing methods for users to update a database or databases used in bias detection, to reflect new words and/or other changes in the language, or to reflect new bias sensitivities that may develop in the cultural context. Another object is a system for bias detection that is readily adaptable for use in the context of a computer network or internetwork. Yet another object is a system for characterizing a corpus of text with respect to degree of bias. Further objects include methods of training and training systems for instructing students, authors, editors, and others about bias in writing. Another way of expressing the purpose of the invention is to describe it as an automatic expert system containing knowledge or expertise concerning language bias and providing unbiased alternative expressions interactively to a writer or editor.

These and other purposes, objects, and advantages will become apparent from a reading of this specification along with the accompanying drawings and the appended claims. BRIEF SUMMARY OF THE INVENTION

This invention provides computer-implemented interactive processing of text including words and phrases considered representative of various biases in writing. While the invention deals with text data, and thus with matters of expression, it is not any particular expression that characterizes the invention, but rather the computer- implemented processes by which a user's expressions in text may be characterized and/or modified for particular purposes. The methods of the invention help a user identify words and/or phrases from selected groups, each group being defined by a database related to a particular type of potential bias; it then offers optional unbiased alternative words or expressions for each biased word found. Because each database contains only words related to a particular type of bias, the methods and system of the invention can use a much smaller lexicon than a general dictionary of the natural language, smaller even than a spell-checker's database, and thus the invention is usable in a system with limited storage capacity. Although a preferred embodiment is an interactive system, the invention can also be used in a fully automatic mode. Such a fully automatic mode is expected to be useful in time-critical situations, such as a newspaper reporter's writing to a deadline, where the text will be edited after automatic processing.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 shows a flow diagram of an overall process performed in accordance with the invention.

FIG. 2 shows a flow diagram of a first portion of a preferred process.

FIG. 3 shows a flow diagram of a second portion of a preferred process. FIG. 4 shows a flow diagram of a third portion of a preferred process.

FIG. 5 shows a flow diagram of a fourth portion of a preferred process.

FIG. 6 shows a block diagram of apparatus made in accordance with the invention and implementing a process performed in accordance with the invention.

DETAILED DESCRIPTION OF THE INVENTION Copyrighted materials A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the U.S. Patent and Trademark Office files or records, but otherwise reserves all copyright rights whatsoever. Overall process

In this description of the preferred embodiments, an overall process used in the invention is described first at a certain level of abstraction, followed by more detailed description of specific methods implementing that overall process. The overall process is shown in FIG. 1, in which references S10, S20, etc. refer to steps of the process. A preferred method of processing a corpus of natural language text in accordance with the invention includes the steps of: (S10) providing first database file(s) of words or expressions potentially expressing bias; (S20) providing a set of bias type codes; (S30) providing second database file(s) including alternative expressions related to each of the words expressing bias; (S40) establishing a correspondence between each of the words expressing bias and at least one of the bias type codes;

(S50) scanning the corpus while (S60) comparing each word with items of the first database; testing for a match (S65) until a word of the corpus matches (Y) one or more of the words expressing bias, thus (S70) identifying a matched word and at least one bias type code. Otherwise, if no match is found (N), a message to that effect may be presented (S80) to the user, and the scanning process continues at the next word (via step SI 40). If a match is found, the process continues by (S90) presenting the matched word and (S100) presenting a representation of its corresponding bias type to the user as an instance of potential bias. It is generally useful to present a representation of bias type to the user, as the user may not recognize the bias type. The process also preferably includes (SI 10) presenting at least one of the alternative expressions to the user. At this point in the process, the system waits (S120) for a user to make a choice. A user has an option (at step S125) of (1) selecting a particular one of the alternative expressions offered, or (2) rejecting all of the alternative expressions (simply by continuing without selecting any of the proffered alternatives, and optionally by (3) editing the text to substitute an un-proffered alternative of the user's own). In a further step (S130), if a user has selected (at step S125) a proffered alternative expression for the matched word then it is substituted for the matched word in the corpus. The process then continues (S140) by repeating the scanning etc. until the entire corpus has been scanned. The end of the corpus is tested at step S150. If the end has been reached, i.e. all words have been scanned (Y), the process ends (S160); if the end has not been reached (N), the scanning process continues at step

S50. Of course, the corpus processed in this manner may be an entire document or only a portion such as a chapter, page, paragraph, sentence, or word, etc. selected by the user from a larger body of text. If a user wishes to have instances of potential bias only identified, without presentation of any alternative expressions, the steps (SI 10 - S130) involving presentation of alternatives may be omitted. For fully automatic operation with no user interaction (described further below), steps S75 - SI 25 may also be omitted. The expressions potentially expressing bias may be individual words, phrases including more than one word, suffixes (such as "-ette," " -trix," and "-ess"), etc. The system may be programmed to count instances of bias found, accumulating the count of words expressing bias and a total word count in order to calculate a percentage or other index characterizing the corpus with respect to a particular selected bias type, with respect to a selected set of bias types, or with respect to all bias types. Thus quantitative bias indices or "scores" may be assigned by the system to any corpus of text, indicating various measures of bias.

The alternative expressions proffered to a user by the system preferably come from individual databases of alternative expressions related to each bias type. These alternative expressions are generally not necessarily synonyms of the potentially biased expressions detected, but in some cases they may be bias-neutral synonyms. The databases containing alternative expressions are preferably constructed from a large corpus of actual diverse texts written by experts in inclusive and diversity-conscious expression or written by diverse authors who have found and used alternative expressions that are unbiased or at least less biased than the expressions they are intended to replace. For any given biased expression, there may be one or more alternative expressions. If alternative expressions are offered to a user, they may be offered in a particular order, e.g., ranked in a preferred order of usefulness. However, a user may disagree with this order or prefer a different order, so the system allows a user to modify the various databases containing alternative expressions, including modifying the preferred order for sets of alternative expressions. User modification and/or augmentation of the alternative expression databases is described below. Each individual database of alternative expressions related to a particular bias type may be provided to the user in the form of a computer-readable medium. Such computer- readable media may include magnetic diskettes, magnetic tapes, CD-ROMs, recordable optical disks, fixed or removable hard disks, memory cards, data cartridges, etc. Of course each of these media may include one or more alternative expression databases. It will be apparent that, for some applications, another embodiment of the invention may be made in which all the treated bias-types are contained in a single database, having one or more bias-type codes being associated with each potentially biased expression in the database. The preferred embodiment has a number of modular data bases, with each modular data base being substantially dedicated to a particular bias type. This organization of the data allows for easier and more flexible updating and/or customization of the databases, and allows for customization of the overall system for particular needs of each user. Thus a particular user primarily interested in gender bias to the exclusion of other types of bias, for example, is able to install just a gender-bias database.

Process details

This description of the methods of this invention continues now with reference to FIGS. 2 - 5, showing flowcharts describing the steps of a preferred process performed by the system. As the preferred system is interactive, the system responds at various stages of the process in accordance with choices made by the user. In FIGS.

2 - 5, as in FIG. 1, the reference numerals S200 etc. refer to steps and substeps of the process. Other reference numerals, such as EOl, are labels to identify branching points in the process. The term "procedure" is used for some steps having several substeps. It will be recognized by those skilled in the art that the same steps or equivalent functions may be programmed in various programming languages, and adapted to the particular syntax and logical structures appropriate to the programming language selected.

The preferred process uses a number of preferred file structures. A control file contains records related to various bias types. There are preferably a number of bias word database files, one for each bias type. Each bias word database file contains records related to words that potentially express bias of that bias type. There is preferably a bias word help database file, containing help text associated with each pair consisting of a word of the bias word database files and a bias type associated with that word. To provide for fully automatic operation, with no user interaction, and/or to provide a preferred order in which alternative expressions are presented to a user for interactive-mode operation, each alternative expression may have a quantitative numerical weight factor, higher weight factors indicating higher preference for replacement of the biased word. In fully automatic operation, a particular substitution of an alternative expression is preferably made or not made in accordance with statistical weighting by the numerical weight factors. These numerical weight factors are preferably assigned while constructing the alternative-expression database. The weight factors are preferably derived from statistical analysis of a large corpus of actual diverse texts written by experts in inclusive and diversity-conscious expression. The factors may be assigned by a consensus of diverse authors who have found and used alternative expressions that are unbiased or at least less biased than the biased expressions they are intended to replace. Some representative file structures that are useful in the efficient performance of the process of the invention are illustrated in

Table 1.

Field name Field type Example

Control file record

1) Bias type A10 Gender

2) Bias type code A2 GN

3) Select indicator Al X or blank

4) Help text A80 Male/female words etc.

Bias word database record

1) Word A30 garbage man

2) Alternate word counter A3 001 - 999

3) Bias type code A2 GN

4) Alternate word A30 trash collector

5) Numerical weight factor A2 01-99 for replacement

Bias word help database record

1) Word A30 garbage man

2) Bias type code A2 GN

3) Help text A80 One may not want to promulgate the idea that only men are appropriate for this occupation. Table 1. Representative file structures. As illustrated in FIG. 2, the detailed process starts (SI 00) with step S200, invoking a startup procedure (S230), displaying an introductory screen (S235) with the system's title such as "Bias Detective™" for a time interval, acquiring access to bias word database files (S240), loading a bias control file into working memory (S250), and displaying a data entry screen (S260). If the user wishes to find and correct biased language in a document (selected in step S210), the process includes invoking a procedure (S215) for importing the selected document (S265). At this point in the process, the user may make a choice , e.g., by pointing to an icon with a pointing device or by typing a keystroke on a keyboard, as to the next step performed by the system. Step S270 handles the event of a keystroke, cursor movement, mouse movement, etc. initiated by the user. For simplicity of description, the process will be described here in terms of keyboard operations by the user. The user may choose to begin scanning a document for biased words (S400) (e.g., by pressing function key F2), choose to select bias type(s) (S300) (e.g., by pressing function key F6), or may type another key (S280). It will be understood that the reference herein to particular function keys F2 and F6, for example, is merely for clarity of description, and not significant to operation of the invention. The user may choose to exit the process (e.g., by depressing the Esc key) causing a branch to E03, may type a non-printable character, or may type a printable character (causing a branch to entry point (EOl) for continuing the process) In the latter case (typing of a printable character), the character typed is displayed on the screen and saved in working memory (S290), and the process returns (EOl) to step S270.. If and when the user chooses to exit the process, then step S220 performs an orderly ending procedure, prompting the user to save the text file being processed, closing open files, etc. in a conventional manner. FIG. 3 illustrates substeps of the bias-type selection procedure (S300) and the bias word scan procedure (S400). Step S305 clears the screen and displays a bias-type selection screen. (In a windowing environment, the screen is not cleared, and step S305 displays a bias-type selection window superimposed on a portion of the screen. It will be understood that other steps of the process may also be implemented in a conventional manner for windows.) Step S310 displays a first bias control file entry. At step S320, a decision is made: if there are remaining bias control file entries to display (Y) step S330 displays the next bias control file entry and returns to S320. If there are no more bias control file entries to display (N), a cursor is moved (S340) to a first selection indicator position for the bias control file entries displayed. When step S340 has been completed, several bias types are displayed for the user's selection. At step S350 user action is expected, and the system waits for the user to make a choice. The user may select a bias type, e.g., by typing the character "X" to choose (or the space character to reject or "un-select") the bias type represented at the cursor position. A user can select more than one bias type. For each bias type selected, the system will scan the corpus of text to detect that type of bias and proffer alternative expressions to the user for each bias word detected, as described hereinbelow. A decision is made (S350) depending on the character typed by the user. If the user types any character except "X," space, or Esc, the system loops (C01), to allow the user to view bias-type selections made so far, and the system waits for another selection entry or Esc. If the user types the Esc key, step S370 clears the bias criteria screen or removes its window and re-displays the data entry screen, and returns at point EOl to allow the user to begin scanning the corpus of text. If the user types an

"X" or space to select the currently-indicated bias type, step S380 saves the control file select indicators and moves the cursor to the next bias-type selection indicator, returning to branch point C01.

Help text may be provided to the user, to explain and distinguish the various bias types and to aid the user in making bias type selections. For example, such help text can describe the differences between race and nationality, or the differences between cultural bias and religious bias. The help text can be made context-sensitive in a manner known to those skilled in the art.

The corpus of text to be scanned for bias may include any string of character data, and may also include formatting indicators such as paragraph markers, etc. Examples of such a corpus include a document previously stored in a file such as a conventional word-processing file, text typed by the user for display on the data entry screen, or a text portion selected by the user and identified by any of the selection means conventionally employed. In conventional word processors, the text selected at any time is often indicated by highlighting the selected text in a distinctive color or reverse video.

Continuing with the process flow illustrated in FIG. 3, bias word scan procedure generally denoted by S400 scans a corpus of text (e.g., a document) for bias words, i.e., words that potentially express bias. Bias word scan procedure S400 begins by saving the current cursor position (S405) and then moving to the beginning of the corpus to be scanned (S410). In the preferred embodiment, the corpus to be scanned is represented by data in a portion of working memory, which may be an array. At step S420, a decision is made depending on whether the end of the corpus has been reached. (Step S420 is executed at several steps of the scanning process, indicated by B01, to determine whether the scanning procedure is complete.) If the end of the corpus has been reached (Y), step S430 re-displays the data entry screen as it was prior to the scan, returns the cursor to the position previously saved, and returns (EOl) to execute decision step S270. If the end of the corpus has not been reached (N), step S440 gets the next word in the corpus (in a conventional manner known and commonly used in word processor spell-checking). As mentioned above, the term "word" as used herein includes phrases, suffixes, etc. Step S450 looks up the word in the bias word database file, i.e., searches for a match in a conventional manner. It will be understood that the searching may be performed in a copy of the bias word database file that is stored in working memory or a cache. At step S460, if the word was not found in the bias word database file, step S420 is repeated (B01). If the word was found, step S470 is performed, which calls control file bias-type match procedure S600, described below with reference to FIG. 4. Then, at decision step S480, the system determines whether the bias type(s) of the found bias word is a selected bias type, i.e., whether its bias type has an "X" in the control file. If the bias type was not selected (N), B01 returns to step S420. If the bias type of the found bias word was selected (Y), step S490 highlights the word on the display and calls procedure S700 to get alternate expression(s) for the found bias word. Step S500 saves the screen, including the highlighted word and cursor position. Step S510 calls procedure S800, and step S520 calls procedure S900, both described in detail below.

At this point in the preferred process, the system has identified a word matching one of the words in the database of a user-selected bias type and will use procedure S700 to proffer one or more alternative expressions for the user's consideration, by displaying the alternative expression(s) using procedure S800. The system then provides means for the user to select an alternative expression (or reject all of the alternative expressions proffered) using procedure S900.

FIG. 4 shows the process flow for procedures S600 and S700. Procedure S600 starts by examining (S605) the first control file record. At step S610, a decision is made based on whether the instant control file record's selection indicator is an "X."

If it is (Y), step S620 tests whether the bias type of the found bias word is the same as the control file record bias type. If, at step S610, the control file record's selection indicator is not an "X" (N), then step S630 tests whether this is the last control file record. At step S620, if the bias type of the found bias word is the same as the control file record bias type (Y), the process exits procedure S600 and continues (B03) at step

S480 (FIG. 3). If it is not (N), the process continues at step S630. At step S630, if this is the last file record (Y), a match was not found for the bias type of the found bias word in control file records that are selected for bias scanning, and the process continues at step S480 (B03). If it is not the last file record (N), then step S650 moves to the next control file entry, and continues (BT1) control file bias-type match procedure S600 at step S610. Procedure S700, which gets alternative words, begins at step S705 by saving the first alternative word attached to the first lookup in the bias word database file. Step S710 tests whether this is the last alternative word in the bias word database file. If it is (Y), the process continues (B04) at step S500. If it is not (N), the process continues at step S720, looking up the next entry of the found bias word in the bias word database file. The next alternative word is saved, and the process returns (AWl) to step S710.

Procedure S800 (for displaying alternate words and expressions to the user) begins with step S805, highlighting the found bias word on the display screen. Step S810 displays a box or window listing all the alternative words found in the bias word database file. Thus the alternative words are proffered to the user. Step S820 looks up the found bias word in the bias help database file. Step S830 displays a box or window containing help text to explain bias implication(s) of the highlighted bias word. Steps S820 and S830 may be omitted in embodiments lacking an optional bias help database file. When procedure S800 ends, control returns (B05) to step S520, which invokes procedure S900 for user selection of an alternate expression.

Alternate word selection procedure S900 begins with step S905, moving the cursor to the first alternate word in the alternate word selection box and waiting for a user keystroke. Successive decision steps S910 and S940 switch to various steps depending on the user's keystroke. If the keystroke is a particular function key, such as F2, step S920 removes the highlight from the found word, removes box(es) from the screen, restores the scan screen, and ends procedure S900 and returns (B06). If the keystroke is the Esc key, step S930 moves to the end of the corpus and ends procedure S900. If the keystroke is a cursor arrow, step S950 moves the cursor in the indicated direction within the alternate word box. Any other key except <Enter>(carriage return) returns (AS1) to step S910. A user's pressing <Enter> selects the alternate word at the current cursor position in the alternate word selection box, and step S960 inserts the selected word into the scan screen, replacing the highlighted bias word on the screen and in the text corpus. Then step S980 restores the scan screen, and procedure S900 ends.

Step S970 begins procedure S1000 to replace the found bias word at other locations in the corpus. Step S1010 prompts the user for response indicating whether to replace the next instance of the found bias word, all instances of that bias word, or no more instances of that bias word. If the response indicates no more instances, the procedure branches to S980 (label AW3). If the response indicates all instances, the procedure branches to S1020, which will replace all instances in the corpus with the selected alternative expression. When the end of the corpus is reached, the procedure branches to S980 (label AW3). Step SI 030 locates the next instance of the current found bias word, and prompt the user for a response to replace, skip, or exit. Step SI 040 handles the response. If the response indicates an exit, there is a branch to S980 (label A 3). If the response indicates a skip, there is a branch to SI 030 (label AW2). If the response indicates a replacement, there is a branch to S960 (label

AW4).

When the process described above is modified to process a corpus of text for automatically (non-interactively) detecting and correcting bias in the corpus, the first database file of words expressing bias, the set of bias-type codes, and the second database file including at least one alternative expression related to each of said words expressing bias are still provided, and each word expressing bias is associated with at least one bias-type code. The corpus is scanned while comparing each word of the corpus with the first database until a word of the corpus matches one or more of the words expressing bias, thus identifying a matched word and at least one of said bias- type codes. If no match is found, the process may include presenting a message that no bias was found. Otherwise, the process automatically replaces each matched word with one of the alternative expressions from the second database, and continues scanning through the corpus until the entire corpus has been scanned. The result is a physically modified corpus of text having unbiased alternative expressions substituted for every biased word found in the input text. As mentioned above, this type of fully automatic detection and correction is especially useful in time-critical situations such as daily newspaper production, and especially when the processed text is edited after processing in this manner. If it is desired to avoid repetitious text, the second database should include more than one alternative expression related to each of the words expressing bias. Then, at each instance after the first instance of replacing the matched word with an alternative expression, a different alternative expression is substituted. In a simple implementation of this type of automatic processing, the selection of alternative expression at each instance may be made according to the predetermined order in which alternative expressions occur in the second database. In a preferred embodiment of the process, the selection is made according to successively smaller numerical weight factors, where these factors have been stored in association with each alternative expression in the alternative-expression database. The assignment of numerical weight factors is described above in connection with Table 1.

Apparatus

FIG. 6 shows a block diagram of apparatus made in accordance with the invention, and/or used to implement a process performed in accordance with the invention, in a preferred embodiment. The apparatus of FIG. 6 may be implemented as a dedicated word processor, i.e., a computer implemented system used only for word processing tasks. The methods and processes described above may also be implemented to function with a general-purpose digital computer of a conventional type, also illustrated schematically by the apparatus of FIG. 6. The system 10 has a text input 20, a central data processing unit (CPU) 30 of a known type, a memory 40 for storing bias codes, a memory 50 for storing words and phrases related to bias and their corresponding bias-type codes, and output apparatus 60 for presenting text output to a user. Output apparatus 60 may be a display screen, teletypewriter, modem, or printer, for example. The text input 20 may be, for example, a keyboard, a modem, a scanner with OCR (optical character reading) capability, a disk or tape drive accepting disks or tapes containing textual data, or a card or cartridge reader accepting memory cards, data cartridges, etc. carrying textual data. A preferred system has both a keyboard and a storage medium reader such as a diskette drive for text input 20.

A device particularly useful in the system is a computer-readable storage medium 25 (such as a memory cartridge or a diskette) containing two or more databases, including a first database 26 of words expressing bias, and a second database 27 including at least one alternative expression for each of the words expressing bias. The first database has, for each of the words expressing bias, a bias- type code as defined in this specification. The second database 27 may also include a numerical weight factor for each alternative expression. The first and second databases 26 and 27 may be combined into one database. This computer-readable storage medium may also include a third database 28 including help messages and/or training messages with explanation s) of particular bias types, examples of groups that may find a particular bias word offensive, rationale for including a word in the first database, etc. Inclusion of third database 28 is especially useful in a computer-based system for language-bias education. This third database 28 may have, associated with each entry, a bias word and/or a bias type, to facilitate context-sensitive help by the system. Each of the provided computer-readable storage medium devices may be limited to one bias type or may include several bias types in the same device. In operation of the system 10, databases 26, 27, and 28 contained in computer-readable storage medium 25 are read into working memory. Medium 25 may also include a control file database such as described in Table 1 and the accompanying text. The bias word database and bias word help database in medium 25 also may have the file structures illustrated in Table 1.

If the system is to present alternative expressions to a user, it has a memory 70 for storing alternative expressions and optionally storing bias-type codes. It may optionally have an audio input 80 and/or an audio output 90. An audio input 80 may perform the function of text input 20 when used with optional speech recognition apparatus (not shown) or with speech-recognition software operating on CPU 30. The CPU 30 is programmed to perform the process described above, including the steps of comparing text entered into to the system through text input 20 with words and phrases related to bias in memory 50, and presenting bias codes retrieved from memory 50. The software program performing these functions also may provide for optionally substituting a selected alternative expression for a matched word, as determined by a user. It will be apparent to those skilled in the art that memories 40, 50, and 70 may be combined. For many applications memories 40, 50, and 70 are preferably combined into a suitably-organized single memory. Similarly, program instructions directing the operation of CPU 30 to perform the process steps described hereinabove may be stored in a read-only memory (ROM) of CPU 30, or in a random- access memory (RAM), or in a portion of a main memory, a portion of which is used as working memory. The working memory may be combined with memories 40, 50, and 70. System 10 may have a network port 95 for connecting with a conventional local or wide-area network, and particularly for connecting with an internetwork such as the Internet, to allow the method of the invention to be used by a multiplicity of users sharing access to system 10.

INDUSTRIAL APPLICABILITY

The following examples are intended to be purely exemplary and are not exhaustive of the uses of the invention's methods and apparatus. Authors may use the invention to avoid a particular type of bias in their writings or to selectively introduce particular types of bias into their writings for particular literary purposes. Publishers and editors may use the invention to characterize text with respect to biases, to edit text, or to suggest changes to authors. Professional is marketing and advertising may use the invention to expand the appeal of their message, to ensure that it is as inclusive as possible. Legal researchers may use it to characterize text contained in documentary evidence. Patent attorneys or agents may use it to find biased language in patent applications, to avoid language that may unnecessarily limit the scope of a claimed invention. Teachers may use the invention to instruct their students about biases in their writing and about methods for recognizing and/or avoiding biases. Librarians, educational textbook buyers, and educators in general may use the invention to characterize text materials with respect to biases. Reviewers of books, articles, and the like may use the invention to characterize the works they review. The many diverse uses of the invention are expected to proliferate and to become more facile as more text becomes available in machine-readable form.

There are various levels of sensitivity regarding bias among writers, editors, reviewers, and readers, and for a particular person, the level of sensitivity to bias may even vary considerably with time, with circumstances, or with the type of bias in question. Societal sensitivity to specific biases is also dynamic, as evidenced by the rise of sensitivity to racial and gender bias in recent decades. These variations are accommodated by the invention in several ways. In all interactive uses, the user has the opportunity, wherever bias has been detected, to type into the text the user's own alternatives, not proffered by the system. One other way that the above variations are accommodated is a method in which individual bias types may be selected for each corpus to be scanned, so that any particular bias type(s) may be omitted from consideration. Another way is a method in which the user has the option, at each detection of an instance of potential bias, of disregarding the bias notification and proceeding without modifying the text. Yet another way is a method whereby the user may modify or augment the existing bias-type databases, by adding terms, subtracting terms, modifying terms, or even by defining a new bias-type database beyond those databases initially provided. The producer of the invention may from time to time provide database updates that reflect changes in language bias. These processes may be facilitated by use of the representative file structures illustrated in Table 1, in which all the fields in the bias word database and the bias word help database are alphanumeric fields. The user may edit these files with any suitable conventional editor, or with a special editor facility provided that is customized to prompt the user for adding, deleting, or modifying entries in each of the database files described herein. Thus both producer updating of the bias word and alternative expression databases, and user modification and/or augmentation of these databases are provided.

When a representation of a bias type is presented to a user, additional information may also be made available. For example, a context-sensitive help message may be associated with each word or phrase expressing bias. The help message can include an explanation of the particular type of bias, examples of groups that may find the biased text offensive, and/or other helpful information. Such help messages may be provided in bias word help database file(s) as described hereinabove. Other embodiments of the invention and modifications to adapt it to various usages and conditions will be apparent to those skilled in the art from a consideration of this specification or from practice of the invention disclosed herein. For example, the invention can be implemented in an embodiment to be invoked by a macro in conjunction with a word-processing program, or it can be implemented as a module for a computerized aid to creativity and problem solving (such as the one described in U.S. Pat. No. 5,153,830 mentioned above). It will be recognized that particular file structures may be employed that are functionally equivalent to those illustrated herein. It is intended that the specification and examples be considered as exemplary only, with the true scope and spirit of the invention being defined by the following claims.

Having described our invention, we claim:

Claims

1. A system for aiding in examination of a corpus of text for bias, said system comprising a memory, a central data processing unit, means for entering the corpus of text as input to said central data processing unit, and means to present text to a user, said system being characterized in that:

(a) said memory stores

(i) bias-type codes, each indicating a type of bias;

(ii) words and phrases related to bias, each of said words and phrases being associated with one or more of said bias-type codes; and optionally (iii) alternative expressions for each of said words and phrases related to bias;

(b) said central data processing unit is adapted for comparing words and phrases of said corpus of text with said words and phrases related to bias; and

(c) said means to present text

(i) presents to a user words or phrases of said corpus of text which match said words and phrases related to bias; and optionally

(ii) presents to a user said alternative expresssions.

2. A method of processing a corpus of text for detecting bias in said text, comprising the steps of: (a) providing a first database file of words expressing bias;

(b) providing a set of bias-type codes;

(c) establishing a correspondence between each of said words expressing bias and at least one of said bias-type codes;

(d) scanning the corpus while comparing each word of the corpus with said first database until a word of the corpus matches one or more of said words expressing bias, thus identifying a matched word and said at least one of said bias-type codes, and otherwise presenting a message that no bias was found; and

(e) repeating said scanning step (d) until the entire corpus has been scanned.

3. A method as recited in claim 2, further comprising the step of: (f) presenting said matched word and a representation of its corresponding bias- type to a user, wherein said repeating step (e) further comprises repeating said presenting step (f) after each execution of step (d).

4. A method as recited in claim 3, wherein said presenting step (f) is performed by displaying on a computer display.

5. A method as recited in claim 3, wherein said presenting step (f) is performed by presenting an audio output.

6. A computer system, operated by program instructions to perform the method of processing a corpus of text for detecting bias as recited in claim 2.

7. A computer system, operated by program instructions to perform the method of processing a corpus of text for detecting bias as recited in claim 3.

8. A method as recited in claim 3, further comprising the steps of:

(g) providing a second database file, said second database file including alternative expressions related to each of said words expressing bias; and (h) presenting at least one of said alternative expressions to a user while performing said presenting step (f).

9. A method as recited in claim 3, wherein said presenting step (f) is performed by displaying within a window portion of a computer display.

10. A method as recited in claim 5, wherein said presenting step (f) includes emitting a distinctive alarm sound.

11. A method as recited in claim 5, wherein said presenting step (f) is performed by causing a computer to enunciate said matched word in a computer-generated voice.

12. A method as recited in claim 8, wherein said alternative-expression-presenting step (h) is performed by displaying the alternative expression on a computer display.

13. A method as recited in claim 8, wherein said alternative-expression-presenting step (h) is performed by presenting an audio announcement.

14. A computer system, operated by program instructions to perform the method of processing a corpus of text for detecting bias as recited in claim 8.

15. A method as recited in claim 8, further comprising the steps of:

(i) waiting for a user to perform at least one of the substeps of:

(1) selecting one of said alternative expressions, (2) rejecting all of said alternative expressions, and

(3) typing a substitute expression in place of said matched word in said text;

(j) if a user performed substep (1), then substituting said selected one of said alternative expressions for said matched word; (k) if a user performed substep (2), then continuing to perform said repeating step

(e); and

(1) if a user performed substep (3), then substituting for said matched word said substitute expression typed by the user.

16. A method as recited in claim 8, further comprising the steps of:

(i) providing a third database file, said third database file including help text related to each of said alternative expressions; and

(j) while performing said presenting step (h), selectively displaying to a user particular help text from said third database file, said particular help text being related to said at least one of said alternative expressions.

17. A method of processing a corpus of text for detecting and correcting bias in said text, comprising the steps of:

(a) providing a first database file of words expressing bias; (b) providing a set of bias-type codes;

(c) providing a second database file, said second database file including alternative expressions related to each of said words expressing bias;

(d) establishing a correspondence between each of said words expressing bias and at least one of said bias-type codes; (e) scanning the corpus and comparing each word of the corpus with said first database until a word of the corpus matches one or more of said words expressing bias, thus identifying a matched word and said at least one of said bias-type codes, and otherwise presenting a message that no bias was found;

(f) presenting said matched word, a representation of its corresponding bias-type, and at least one of said alternative expressions to a user;

(g) waiting for a user to perform at least one of the substeps of:

(1) selecting one of said alternative expressions,

(2) rejecting all of said alternative expressions, and (3) typing a substitute expression in place of said matched word in said text;

(h) if a user performed substep (1), then substituting said selected one of said alternative expressions for said matched word; (i) if a user performed substep (2), then continuing;

(j) if a user performed substep (3), then substituting for said matched word said substitute expression typed by the user; and

(k) repeating said scanning step (e), presenting step (f), waiting step (g), and one of said substituting or continuing steps (h), (i), and (j), until the entire corpus has been scanned.

18. A method as recited in claim 17, wherein the corpus is a selected subset of text, selected by a user from a larger body of text.

19. A computer system, operated by program instructions to perform the method of processing a corpus of text for detecting and correcting bias in said text as recited in claim 17.

20. A system for aiding in examination of a corpus of text for bias, comprising:

(a) memory means for storing bias-type codes;

(b) memory means for storing words and phrases related to bias, each of said words and phrases being associated with one or more of said bias-type codes; (c) central data processing unit means for comparing words and phrases;

(d) means for entering the corpus of text as input to said central data processing unit means;

(e) means comparing said words and phrases related to bias; and

(f) means to present said matched word or phrase to a user.

21. A system as recited in claim 20, further comprising:

(g) memory means for storing alternative expressions for each of said words and phrases related to bias; and

(h) means to present said alternative expressions to a user.

22. A system as recited in claim 21, further comprising:

(i) means for optionally substituting a selected one of said alternative expressions for said matched word, as determined by said user.

23. A system as recited in claim 21, further comprising:

(i) means for automatically substituting a selected one of said alternative expressions for each said matched word.

24. A system as recited in claim 21, further comprising:

(i) a computer-readable storage device containing a first database, said first database file including words expressing bias and at least one bias-type code for identifying a type of textual bias associated with each of said words expressing bias; and a second database, said second database file including alternative expressions related to each of said words expressing bias.

25. A computer-readable storage device for use in a system for aiding in examination of a corpus of text to detect bias in the corpus of text, said storage device comprising: a first database comprising words expressing bias, said first database further comprising at least one bias-type code for identifying a type of textual bias; and a second database, said second database including at least one alternative expression related to each of said words expressing bias.

26. A computer- readable storage device for use in a system for aiding in examination of a corpus of text to detect bias in the corpus of text as recited in claim 25, said second database further comprising: a numerical weight factor corresponding to each of said at least one alternative expressions, said numerical weight factor being selected to indicate a quantitative preference for substitution of each alternative expression.

27. A computer- readable storage device for use in a system for aiding in examination of a corpus of text to detect bias in the corpus of text as recited in claim 25, said storage device further comprising: a third database comprising help text related to each of said alternative expressions.

28. A method as recited in claim 2, further comprising the step of:

(f) replacing said matched word with a predetermined alternative expression.

29. A method as recited in claim 2, f rther comprising the step of: (f) presenting to the user said matched word and a predetermined alternative expression, and presenting an option to replace every instance of the same matched word with said predetermined alternative expression.

30. A method of processing a corpus of text for automatically detecting and correcting bias in said text, comprising the steps of: (a) providing a first database file of words expressing bias;

(b) providing a set of bias-type codes;

(c) establishing a correspondence between each of said words expressing bias and at least one of said bias-type codes; (d) providing a second database file, said second database file including at least one alternative expression related to each of said words expressing bias;

(e) scanning the corpus while comparing each word of the corpus with said first database until a word of the corpus matches one or more of said words expressing bias, thus identifying a matched word and said at least one of said bias-type codes, and otherwise presenting a message that no bias was found;

(f) replacing said matched word with a predetermined one of said at least one alternative expressions; and

(g) repeating said scanning step (e) until the entire corpus has been scanned.

31. A method as recited in claim 30, wherein, to avoid repetitious text, said second database file includes more than one alternative expression related to each of said words expressing bias, and at each successive instance after the first action of performing step (f) of replacing said matched word with a predetermined alternative expression, a different alternative expression selected from said second database file is substituted.