WO2003030016A2

WO2003030016A2 - System for generating a collection of text materials

Info

Publication number: WO2003030016A2
Application number: PCT/EP2002/010718
Authority: WO
Inventors: Peter FRÖHLICH
Original assignee: Abb Research Ltd.
Priority date: 2001-09-27
Filing date: 2002-09-25
Publication date: 2003-04-10
Also published as: DE10147854A1; WO2003030016A3

Abstract

The invention relates to a system for automatically generating a collection of text materials, in particular, for a story in which a data processing device is provided and is designed for storing a number of anecdotes as XML-DTD, whereby each anecdote has a series of event-describing text paragraphs and associated annotations. A number of text paragraphs are compiled to form a precise collection of text materials by analyzing demands, whereby the text paragraphs each contain, as elements, a reference to the name of the anecdote, annotations of the text paragraph and to the text of the paragraph. During the analysis, inquiries are regarded as types of demands and they specify that a specific annotation should be associated with the searched text paragraph, and are regarded as exclusion criteria that specify that a specific annotation should not be associated. The system is also disposed for firstly determining a total number of text paragraphs by analyzing the inquiries and for subsequently selecting the searched number of text paragraphs based on the exclusion criteria.

Description

System for generating a collection of text material

description

The invention relates to a system that automatically generates material collections for company-related stories. A story is a story that is based on the experience of employees of a company in the execution of business processes.

From the field of knowledge management (see Don Cohen, Laurence Prusak: In good Company - How Sociai Capital makes Organizations work. Harvard Business School Press, Boston Massachusetts, 2001), it is known that the majority of knowledge in a company is not explicitly documented is. Rather, the knowledge exists in the form of experiences in the minds of the employees. The company loses this knowledge when the employees concerned leave the company. In addition, the application of business processes is not uniformly good throughout the company: employees in different parts of the company can benefit from the experience of their colleagues if their knowledge is available to them. Various approaches are concerned with collecting the implicit knowledge of experience, generalizing it and incorporating it back into the business processes. An example is the so-called Experience Factory (see Basili, V., G. Caldiera, D. Rombach (1994): The experience factory. In Marciniak (ed.) Encyclopedia of Software Engineering, vol 1. John Wiley & Sons, S . 469-476). An organization is built up here that collects the experience in a database, processes it and makes it available to projects.

However, such databases are unsuitable for certain types of knowledge, such as values, norms of behavior or beliefs. Abstract content of knowledge is not internalized if it is formulated directly, but must be illustrated using processes and real examples (see Cohen, 2001). Therefore wins in Knowledge Management literature emphasizes the concept of story (see D. Snowden: The Paradox of Story, Journal of Straggly and Scenario Planning, Ark Publications, November 1999).

A story describes how a protagonist reacts to a problem or an opportunity and shows the result of this reaction. The narrator pursues a goal with a story, e.g. following a business process more closely. An anecdote, on the other hand, is a "natural" story that does not necessarily pursue a goal. In contrast to colloquial language, anecdote does not necessarily mean an amusing event, but simply a narrative about a sequence of events in working life.

Many companies are now using stories to improve their business processes (see Snowden, 1999). A typical procedure for creating a targeted story is as follows:

• Conducting interviews to record anecdotes from (completed) projects.

• Analysis of the anecdotes to identify the explicit and implicit knowledge.

Such an analysis determines:

- The type of knowledge representation, i.e. in what form is the knowledge applied in the anecdote (as a document or more generally: artifact, ability of a person, heuristic, or natural talent).

- The way in which knowledge is used (for judgment, decision, problem solving, etc.).

- The key message of the anecdote.

- Definition of goals that should be achieved by spreading the story, e.g. Follow a business process more closely or refute a rumor. Derivation of specific requirements from the goals.

- Construction of the story from suitable elements of the anecdotes that contribute to the achievement of the goals or requirements. The result of the procedure described above is a story that is recognized and understood by an employee of the company because it is based on their experience (in the form of anecdotes). However, the story as a whole is fictional, as it combines the elements of the anecdotes into a new plot.

However, the analysis process described above is very complex. When systematically constructing a story based on given goals and requirements, a large number of anecdotes must be viewed from which the elements of the story must be selected. However, no software tools exist for this screening.

The invention is therefore based on the object of specifying a system for automatically generating a text material collection, in particular for generating a text material collection for stories.

This object is achieved by a system for the automatic generation of a text material collection, which has the features specified in claim 1. An advantageous embodiment is specified in a further claim.

The invention accordingly relates to a system for the automatic generation of a text material collection, in particular for a story, in which a data processing device is present and is set up to store several anecdotes each as XML-DTD, each anecdote being a sequence of event-describing text paragraphs and associated annotations. By evaluating requirements, a number of text paragraphs are created as a targeted collection of text material, the text paragraphs each containing a reference to the name of the anecdote, annotations of the text paragraph and the text of the paragraph as components. In the evaluation, requests are taken into account as request types that describe that a specific annotation should be assigned to the text paragraph being sought, as well as exclusion criteria that describe that a specific annotation should not be assigned. The system is also set up to first determine a total amount of text paragraphs by evaluating the inquiries, and then to use the exclusion criteria to determine the desired amount of text paragraphs. to select zen. It is preferred to use a lot of anecdotes provided by different company employees.

A further description of the system is given below using an exemplary embodiment shown in the drawing figures.

It shows:

1 shows the process of generating a material collection,

2 shows an example of an anecdote,

3 shows an anecdote supplemented by annotations,

4 an XML DTD of an anecdote,

5 shows an anecdote as an XML file, and

Fig. 6 is a screen display for interactive input.

The system works with a standard data processing device that has the necessary means for data storage, processing and output. By means of suitable software, the system is set up for a mode of operation shown schematically in FIG. 1, in which two types of information that are evaluated in a two-phase procedure.

First, the system uses a collection of anecdotes. These anecdotes were annotated by a knowledge management team in collaboration with the authors of the anecdotes, which are e.g. the core statements, values, rules, the type of knowledge representation or the type of knowledge application. In phase 1, the system extracts these comments.

On the other hand, the user specifies requirements for the story to be created. For example, one requirement may be that the story emphasizes the need to train employees. In the second phase (phase 2), the system extracts suitable paragraphs for a story from the anecdotes by comparing requirements and comments contained in the anecdotes.

In phase 1, the collection of anecdotes is read and processed by the system. Each of the anecdotes consists of two levels. On the first level, it contains the text that describes a sequence of events. On the second level, it consists of annotations that comment on the text. These annotations are called annotations. The text of the anecdote corresponds to a reproduction of events by an employee of the company, such as is recorded in an interview.

2 shows an example of an anecdote regarding experience with review meetings.

3 shows part of this anecdote, supplemented by annotations. Here, e.g. Tom Gilb classified the book as an artifact, i.e. explicitly represented knowledge, and classified the introduction of the review meetings as a solution to the problem.

Text and annotations are represented by the system in XML (see Simon St. Laurent and Robert Biggar. Inside XML DTDs. McGraw-Hill, 1999). An XML DTD (XML document type definition) describes that documents consist of paragraphs, which in turn contain annotations. A section of an XML DTD that defines anecdotes is shown in FIG. 4. An annotation therefore consists of an attribute name (att) and a value (value). For example, the annotation "Problem: Software Quality" contained in FIG. 3 consists of an attribute name "Problem" and the assigned value "Software Quality".

The system saves anecdotes in XML files that match the DTD just described. As an example, the anecdote already shown in FIG. 3 is shown as an XML file in FIG. When processing the anecdotes, the system reads the XML files and checks them for consistency with the help of the DTD. It collects all annotations and constructs a directory for each anecdote that is used as an annotation. occurring attribute-value pairs. Finally, a directory is created for all the anecdotes.

Based on these directory structures, the system can now efficiently determine for each attribute-value pair in which documents or even in which paragraphs it occurs. For example, the system can now determine the amount of all anecdotes and paragraphs that deal with the software quality problem.

In the second phase, the system creates the material collection for the story. For this purpose, requirements for the story to be created are evaluated. The system takes two types of requirements into account:

- Inquiries (Find): The request describes that the story should contain a certain topic (i.e. a certain annotation <x = y>). For example, Tom Gilb's book may be required to appear in the story. Then the request is Find <"artifact" = "Book by T. Gilb">

- Exclusion criteria (Avoid): In this case, the requirement describes that anecdotes that contain a certain annotation <x = y> should not appear in the story. For example, it may be required that descriptions of code reviews (a certain type of review meeting) should not appear in the story. Then the requirement is Avoid <"problemloesung" = "Code Review">.

When creating the material collection, the system processes the requests in sequence. For each query of the form Find <x = y>, the total amount M1 of all paragraphs is determined, which contain the annotation <x = y>. The exclusion criteria are now applied to the total quantity M1 of the paragraphs found. A paragraph fulfills an exclusion criterion Avoid <x = y> if the annotation <x = y> does not appear in the paragraph itself or in another paragraph of the anecdote.

Those paragraphs from the total quantity M1 that meet all the exclusion criteria are entered in the material collection. The paragraph entry has three components:

- a reference to the name of the anecdote,

- the annotations of the paragraph, - the text of the paragraph.

In this way, all requests are processed in sequence. A material collection is created which, in the order of the inquiries as quantity M, contains those paragraphs that meet all the exclusion criteria. Paragraphs that fulfill several inquiries are only included in the collection once (and not repeated).

The anecdote from FIG. 3 is again considered as an example. Thereby three

Story requirements accepted:

Find <"problem solving" = "Review Meetings">

Find <"archetype" = "project manager">

Avoid <"problem solving" = "Code Reviews">.

As described, the system evaluates the requests in sequence. The following paragraph is found on <"problem-solving" = "Review Meetings">:

“Project manager Meier therefore introduced regular review meetings. He learned the technique by reading Tom Gilb's book. "

Now the exclusion criterion is checked: None of the paragraphs of the anecdote may contain the annotation <"problemloesung" = "Code Reviews">. This is true. The following entry is generated in the material collection: "Source: Experience with review meetings

Annotations: <"problem solving" = "Review Meetings">

<"artifact" = "Book by T. Gilb"> Text: Project manager Meier therefore introduced regular review meetings.

He learned the technique by reading Tom Gilb's book. "

The second query is evaluated accordingly, and the paragraph "Mr. Meier moderated the meetings" is found. If we add the requirement Avoid <"problem" = "software quality">, none of the paragraphs will be added to the material collection. Since the first paragraph of the anecdote contains this annotation, the entire anecdote is no longer taken into account. Exclusion criteria relate on properties of the anecdote as a whole that do not have to be annotated in each individual paragraph.

The system described is thus a system for the automatic creation of material collections for targeted stories. Automation is made possible by using the annotation concept described above. Such automation would be e.g. cannot be achieved using information retrieval techniques (see Ricardo Baeza-Yates and Berthier Ribeiro-Neto: Modern Information Retrieval. ACM Press and Addison-Wesley, 1999), since these are too imprecise. It would be possible to find the term "review meeting" in an anecdote, but the meaning of the review meeting as a solution to the problem is only given by the annotation.

Since the material collections are shaped by the requirements, the proposed system particularly supports the formulation of precise requirements. In particular, it is ensured that the correct attribute names and attribute values are used in the requirements, i.e. that the same vocabulary is used in requests and anecdotes. The following properties of the system contribute to this:

Ensuring correct attribute names in the annotations: The DTD for collections of anecdotes contains a construct for listing attribute names: <! ELEMENT attDecl EMPTY> <! ATTLIST attDecl attname CDATA # REQUIRED>

The system ensures that the anecdotes only contain annotations whose attribute names appear in this list. This prevents spelling errors and inconsistent attribute names in the different anecdotes, e.g. the system reports an error if the "artifact" attribute is declared but "artifact" is used.

Interactive input: When formulating the requirements, an interactive input mask is used, as shown by way of example in FIG. 6. This input mask shows all attribute names and attribute values that appear in the anecdotes, which considerably simplifies the definition of requirements. In addition, the user gets instant Feedback on his requirements, ie the list of anecdotes, the paragraphs of which match the current requirements, is displayed immediately when the request is formulated.

The system described is not only suitable for stories, but also for other types of text that rely on material collections from different sources, such as Reports or articles.

Claims

claims

1. System for the automatic generation of a text material collection, in particular for a story, a data processing device being present and set up for this purpose,

a) save several anecdotes each as XML-DTD, each anecdote comprising a sequence of event-describing text paragraphs and assigned annotations, and b) by evaluating requirements to create a set (M) of text paragraphs as a targeted collection of text material, the text paragraphs each contain a reference to the name of the anecdote, annotations of the text paragraph and the text of the paragraph as components, and where b1) in the evaluation as request types, queries (Find) that describe that the text paragraph sought should be assigned a specific annotation, and Exclusion criteria (Avoid), which describe that a certain annotation should not be assigned, are taken into account, and b2) a total amount (M1) of text paragraphs is first determined by evaluating the queries (Find), and then using the exclusion criteria (Avoid) From this, the searched quantity (M) of text paragraphs is selected.

2. System according to claim 1, characterized in that it is set up to enter the requirements to be taken into account by means of an interactive input mask, which displays all attribute names and attribute values that occur in anecdotes.