WO2010051966A1

WO2010051966A1 - Method for semantic processing of natural language using graphical interlingua

Info

Publication number: WO2010051966A1
Application number: PCT/EP2009/007868
Authority: WO
Inventors: Michael Mende
Original assignee: Lingupedia Investments Sarl
Priority date: 2008-11-07
Filing date: 2009-11-03
Publication date: 2010-05-14
Also published as: RU2011122784A; CN102272755A; RU2509350C2

Abstract

A method for processing natural language using a language processing system is described herein. Written or spoken text is input to the language processing system. The method includes the step of analysing the text syntactically. Next a step of extracting components of the text and their relation to each other within the text follows. A graph or graphical representation of the text is generated or used as a language independent representation of the meaning of the text. This graph or graphical representation is used to perform modelling, knowledge representation and processing at the language processing system. Further a system for processing natural language and a method of developing a language processing system are described.

Description

METHOD FOR SEMANTIC PROCESSING OF NATURAL LANGUAGE USING GRAPHICAL INTERLINGUA

The present invention relates to a method for processing natural language using a language processing system, in particular an electronic translation system, wherein written or spoken text is input to said language processing system. The present invention further relates to a translation system, and more particular to an online translation system.

Processing natural language using language processing systems is problematic. Natural language consists of a sequence of words arranged in a specific manner to express a certain meaning. Strongly simplified, language processing systems may analyse the text by looking at the sequence word-by-word. Unfortunately, isolated analysis of single words is not able to retrieve the meaning of the sequence correctly. In some cases it will be successful, very often the analysis will fail, as a text is more than just a clustering of words. The sentence "colourless green ideas sleep furiously" is formed with words which are ordered syntactically correct, i.e. syntax (rules and principles that govern sentence structure of a language) is applied correctly. But it can be easily seen that this sentence is completely meaningless. A system which only focuses on single words would try to process the sentence, where it is obvious that it can't be processed reasonably.

For example, an electronic translation system may process an input text sequence according to a process shown in Fig. 1. In Block 100, a user may enter an input text sequence for translation through, for example, a user interface, an electronic document, or the like. In Block 102, the electronic translation system may parse the sequence based on the syntax rules of the source language. In Block 104, the electronic translation system may perform dictionary look-up using the input language as an index into an output language dictionary for each word. In Block 106, the electronic translation system may render the translated words based on syntax rules of the output language, and in Block 108, the electronic translation system may output to result to the user, through, for example, the user interface, an electronic document, or the like. Some systems known in the art use a semantic check. These systems use lexicons which combine words with attributes. At performing a semantic check the attributes have to be consistent. For example, the word "animal" is specified as being "living", a stone as "not living" and "eat" as "living". Using this kind of semantic check, the sentence "the stone eats grass" can be specified as wrong, because stones are not living, whereas the sentence "the animal eats grass" is correct, because both "animal" and "eat" have the attribute living.

These decisions are claimed to represent the process of understanding in artificial intelligence. Unfortunately this approach is very limited. At processing natural language, sentences are generally much more complex and cannot be handled by such systems. Great efforts are made to solve these problems. On one hand there are positions claiming that programming of semantics is impossible. On the other hand there are companies who invest Millions into research in the area of semantics. Yet, no system known in the art is capable of processing natural languages properly.

In addition to the foregoing, development of electronic translation systems is labour intensive and results in one-to-one language pairs. For example, Fig. 2 illustrates a conceptual diagram of language pairs for four (4) languages, English, French, Spanish, and German. However, to translate from any one of the four languages to any other, the translation system actually uses six (6) sets of language pairs, that is, an English-German, English-French, English-Spanish, French-Spanish, Spanish- German, and German-French language pairs.

The complexity of such a system increases dramatically as more languages are added. For example, adding a fifth language, Italian, adds four (4) additional language pairs bringing the total to ten (10) pairs. It is noteworthy that for each pair, complicated dictionaries, syntax, and semantic rule sets use large resources for development. Similarly, in such systems, each translation is undertaken individually even when translation into multiple languages is desired.

It is therefore an object of the present invention to improve and further develop a method for processing natural language which properly processes the semantics of text or other data such as input speech or the like. It is further an object of the invention to improve and further develop a language processing system for processing natural language which avoids some or all of the aforementioned problems.

In accordance with the invention the aforementioned object is accomplished by a method comprising the features of claim 1. According to this claim, such a method is characterized by the step of analysing said text regarding its syntax and morphology, the step of extracting components of the text and their relation relative to each other, the step of generating or using a graph or graphical representation of said text as languages independent representation of the meaning of said text, and the step of performing processing of said text using said graph or graphical representation.

According to the invention, it has first been recognized that the problem can be solved by using findings in the neurological field. One basic finding relates to the fact that human cognition clearly separates syntax and semantics. If several people with different languages are sitting together and there is an umbrella in the room, everyone "knows" that this is an umbrella. But this "knowing" does not mean that the word "umbrella" is activated anywhere in a brain of a person present. However, for communication purposes the object "umbrella" is tagged with a language specific word. The people involved know the object without using language. If, for example, they want to go outside while it is raining, they activate their "tag" by means of the language-specific dictionary for communication purposes. They will for instance ask "May I have this umbrella?".

This clear separation of syntax and semantics (or of language dependent information and language independent information) is transferred to the method according to the invention. In a first step, the text which is input in the language processing system is analyzed regarding its syntax and morphology. At this step, the grammatical structure is analyzed. This results in a first basic understanding of the text. In the next step, the single components of the text are extracted. The text generally consists of sentences which comprise subject, object and verb, respectively. Each component can be extracted and their function within the sentence can be retrieved. These single components and their relation to each other are used in the next step of generating a graph or graphical representation of said text. The single components form nodes of the graph and the relation between the components are represented by edges. The graph is generally represented as matrices. However the logical structure may also be represented graphically in order to improve understanding for humans. It has been found, that this graph can be made completely independent to the language which is used with the text input to the system. The graph includes semantic information which can be easily used for further processing.

Instead of a graph and/or its graphical representation, other forms of graphical representation might be used. This for instance includes the representation using video, pictograms or the like.

Alternatively or additionally to the step of generating a graph or graphical representation, an already existing graph or graphical representation can be used at a step of using a graph or graphical representation. This graph or graphical representation describes knowledge which is already present at the language processing system. At this step, the components extracted from the text are matched with elements of the existing graph or graphical representation. Thereby a subset of the existing graph or graphical representation is determined.

According to a particularly preferred embodiment, the text input to the language processing system is modelled in a visual-graphical, or pictogram way. This results in a visual-graphical model which is a language-independent representation of the text and which can be understood by each user of the language processing system. Thus the users do not have to have knowledge about the languages involved. This is also true if the user does not understand any language used at the language processing system.

At the step of analysing text, information about the grammar of the language used in the input text is accessed. Each language has its own specific grammar dictating how words are to be arranged. To enable users without any programming knowledge to write grammars, grammatical data might be entered by means of a grammar editor. Preferably, this grammar editor is language independent. Only a certain formalisation of the possible structures of the languages is required at hand. By that, time- consuming development of different grammars for every single language can be avoided and instead, quick and efficient prototyping is possible. That way, new languages can be integrated into the language processing system quickly and straightforwardly. The grammars generated by the grammar editor might be used with language analysis as well as generation of language.

Preferably the step of analysing the text is performed by a syntactic layer of the language processing system. The language processing system might be configured modularly which enables reusability and modularity of the system. A syntactic layer might perform segmentation and tokenization of the text. Segmentation indicates the determination of the sentential units of the text, where tokenization means the identification of the concrete word forms within the sentence. At performing segmentation and tokenization, the single elements as well as their relationship within the sentence can be analysed in terms of syntax and morphology.

To improve modularity and to gain a method which can be universally used, the syntactic layer might be docked to the language processing system. Doing so, different languages can be easily integrated into the language processing system by adding a new syntactic layer to the system. As the processing within the system is performed with a language independent representation of the text, any language can be processed using the method concerning the invention. Texts in new languages are transformed to the language independent representation by docking a new syntactic layer to the language processing system. Thus, the method can be used rather universally.

Each language docked to the language processing system might be represented in a separate syntactic layer. Thus, syntactical issues can be configured completely independent of each other.

Furthermore, it is possible that single languages can have common parts of the syntactic layer. For instance High German, Swiss German and Austrian German have great parts of grammar in common. Only several rules will differ. In this case the syntactic layer can have a part which is common to several languages and might have parts which are specific to a specific language. This reduces the work of changing rules of the single languages and facilitate the entry of data used at the syntactic layer. Thereby abstractions of languages can be re-used with the single syntactic layers.

Language independent information might be extracted at a relational layer. Generally, language independent information comprises objects, actions and attributes and their relations. Objects are usually represented by nouns in languages like German, English or Chinese. Actions are generally described by the verbs of the text. But also adjectives can represent action. For instance two companies can be tagged with "compete" or "being competitive". Attributes can be sensory attributes like colour, temperature, size or quality as well as attributes like emotions. These objects, actions and attributes are extracted from the text by the syntactic and relational layer and are sent to the semantic layer.

At the step of generating the graph or graphical representation, objects, actions and attributes of a sentence or phrase of the text are linked together and represented as a graph or graphically. The graph representation (e.g. as a matrix or matrices) facilitates processing of the text within the language processing system. Though graphs can also be represented graphically, a pure graphical representation (without being a graph, e.g. video or pictograms) might be more powerful, as it offers greater flexibility regarding representation capability.

To achieve the language independency of the language processing system, objects, actions and attributes can be represented graphically or through pictograms. For instance a car can be represented by a pictogram of a car, a bench can be represented by a pictogram of a bench, the attribute "green" can be a green area, "to give" can be represented by pictograms of a person handing over an object to another person or by video and "to bark" can be represented by a sound. Thus, the graphical representation of the semantic can be understood by everyone without the objects, actions or attributes being tagged with terms of a specific language.

The step of processing the text might comprise the step of reasoning over the extracted semantics of the text. This can be done by comparing the extracted semantics with a model or determining the distance between the entities involved. A central part of the method might be a meaning world. The meaning world represents the object world. The object world's main task is to represent objects which are usually represented by nouns in languages like German, English or Chinese. It consists of several two-to-n-dimensional spaces storing the objects (or prototypes of them) and orders them in meaningful combinations.

The objects of the object world can be organised using structure trees or structure networks which link the single objects logically. It has been found that humans organize knowledge about objects of the world and their relations in a meaningful structure. This organization is done in a non-uniform way. They are using concepts and categories for storing and sorting information. Such a grouping of categories can exist for electronic devices (e.g. computer, printers, and digital phones), paper (e.g. letter, document, and invoice), buildings (e.g. houses, museums, and offices), etc. The single objects of a category can be linked to other categories. For instance an office building has several rooms which are equipped with furniture, electronic devices, papers, etc. The furniture might comprise desks, chairs or bookshelves. On the other hand, a chair might be an office chair as well as a rocking chair. Both are chairs but fulfil completely different purposes. In this manner the single words are linked together in categories.

The meaning world further comprises an action space which is responsible for the representation of actions. Actions can be connected to any other unit in the meaning world, for instance the unit that is tagged with the English word "withdraw" can be bound to the objects "person", "money" and "cash point" being the actors involved. Such connections are called molecules.

Further the meaning world might contain an attribute space which contains attributes of elements. Most, if not all attributes can be quantified in some natural manner. Sensory attributes like colour, taste, size or pressure have a one-to-three- dimensional representation used in various contexts. Colours for instance can be reproduced using a colour spindle which is defined by hue, saturation and brightness of the desired colour. Also emotions can be defined using a multidimensional representation. Concerning to a model proposed by psychologists, a six or eight dimensional emotional simplex can be used to superpose all emotions of a human being. Thus, also emotions can be represented in a language independent way.

The language representation of the text might be ambiguous. E.g. in the sentence "the chicken is ready to eat" , the chicken can be interpreted either as the eater or as the dish that will be eaten. At the sentence "we saw the man with the telescope", the telescope can be either in possession of the man or "we" have it. These ambiguities can be resolved from the context of the sentence. This context can be retrieved from the meaning world. If the previous sentences relate to a farming surrounding, the chicken is most likely the eater. When the previous sentences refer to cooking, the chicken is most likely the one which will be eaten. This context related issues might be retrieved from the meaning world.

An ambiguous text will correspond to several graph or graphical representation, where the number of representation is the number of meanings which can be retrieved from the text. Using the meaning world, the representation which is most likely true can be determined.

In the language processing system, there might be a relational layer which links the syntactic layers and the semantic layer. This relational layer might contain abstractions about possible relations between the objects in the layers. The relational layer receives information output by the syntactic layer and performs further generalization and abstraction.

Concerning one embodiment of the invention the method might be used in a translation system. In this case the step of processing comprises the step of generating a translation of the text to a language different to the original language of the text. As the graph or graphical representation is language independent, it can be the basis of the translation to any language. At performing the steps of the method, first the original text is analysed regarding its syntax and morphology. Second the components of the text and their relations to each other are extracted and this information is used for generating a graph or graphical representation of the text or for using an existing graph or graphical representation as language-independent representation. After an optional semantic checking, the language-independent representation is transferred to a textual representation. This step of transferring can be performed by the syntactic layer, as this layer uses syntactical and morphological information of the target language. Resulting from the modular configuration of the system, each language can be theoretically translated into each other language. As there is a language independent platform in between, each language just has to be related to the language independent representation. Thus, it could be the case that no dictionaries linking the single languages with each other are necessary. This extremely facilitates the development of an automatic translation system.

According to another embodiment of the invention, the method can be used for searching since it is capable of improving the results of search engines tremendously. A user types a question into the website of a search engine. This question is analysed syntactically and morphologically, and the components of the text and their relations are extracted. This information is used in generating an internal graphical representation of the question. Ambiguities can be determined and resolved. Further, by leaving the string-based approach, single words of the question can be generalized using abstractions in the structure trees. Thus, the quality of the results can be improved.

Concerning another embodiment of the invention, the method can be used at analysing a text. It is possible to retrieve the topics presented in a text. This can be used to categorize a text automatically. Further it can be used to find logical chains or information about semantic structures in the text.

Concerning to another embodiment of the invention, the method can be used for generating responses on the text input to the language processing system. For instance the system can generate an automated answer on a question which is sent by a user requesting support. In contrast to methods known in the art, the method can analyse and "understand" the text and can create an appropriate answer to the question using the knowledge represented in the meaning world model.

Further embodiments are possible. As the method offers a language independent representation of a text, the step of processing can be replaced by a vast number of different steps. Thus, the invention can be used very universally. Further the single embodiments described herein can be combined arbitrarily if desired.

With each embodiment the text generated at the step of processing might be output to the user as written or spoken language or as depiction. If the step of processing comprises the step of analysing the text, the output can also comprise statistics or a list of topics or input for searching.

For improving and facilitating the building of the databases used within the system, the knowledge needed at the steps of the method according to the invention might be input via a web interface. The knowledge might include lexicon tags, content of the meaning world model, grammar information, attribute representation or the like. This information can be input by an open ground of users who enter the information in a user friendly way.

Regarding a language processing system for processing natural language and in accordance with the invention, the aforementioned object is accomplished by a method comprising the features of Claim 16. Preferred embodiments of the invention are described in the dependent claims 17 to 27.

The aforementioned objectives are further accomplished by a method of developing a language processing system according to Claim 28 and its embodiments as described in dependent Claim 29 and 30.

There are several ways how to design and further develop the teaching of the present invention in an advantageous way. To this end it is to be referred to the patent claims subordinate to Claim 1 , 16 or 28 on the one hand, and to the following explanation of a preferred example of an embodiment of the invention illustrated by the drawing on the other hand. In connection with the explanation of the preferred example of an embodiment of the invention by the aid of the drawing, generally preferred embodiment and further development of the teaching will be explained. In the drawings: Fig. 1 illustrates an exemplary flow chart of a traditional translation process.

Fig. 2 illustrates an exemplary conceptual diagram of an embodiment of language pairs used in a traditional translation process of Fig. 1 ,

Fig. 3 shows a structure of a translation process,

Fig. 4 is a graphical representation of a sentence,

Fig. 5 is a possible representation of actions,

Fig. 6 shows representations of temperature (part a)) and emotions

(part b)),

Fig. 7 illustrates an exemplary flow chart of a meaning world translation process,

Fig. 8 illustrates an exemplary conceptual diagram of an embodiment of language pairs used in the meaning world translation process of Fig. 7,

Fig. 9 illustrates an exemplary block diagram of an embodiment of a meaning world translation system,

Fig. 10 illustrates an exemplary block diagram of an embodiment of a meaning world system of Fig. 9,

Fig. 11 illustrates an exemplary flow chart of an embodiment of an add language process for adding a language to the meaning world translation system of Fig. 9, Fig. 12A illustrates an exemplary flow chart of an embodiment of an add term process for adding a term to a language dictionary in the meaning world translation system of Fig. 9,

Fig. 12B illustrates an exemplary flow chart of another embodiment of a add term process for adding a term to a language dictionary in the meaning world translation system of Fig. 9,

Fig. 13 illustrates an exemplary block diagram of an embodiment of a system including one or more translation servers capable of implementing the meaning world translation system of Fig. 9,

Fig. 14 illustrates an exemplary block diagram of an embodiment of a computing system capable of implementing one or more components of electronic systems described herein.

Fig. 3 shows an example of a translation process according to embodiments of the invention. The sentence "die grϋne Bank steht im WaId" should be translated using the invention. Fig. 3 shows the semantic layer 2 which is the core of the language processing system 1. The semantic layer is embedded in a relational layer 3. To this relational layer 3, several syntactic layers 4, 5, 6 are docked. Each syntactic layer represents a language: syntactic layer 4 represents German, syntactic layer 5 represents English and syntactic layer 6 represents Polish.

The text input to the language processing system 1 is input at the syntactic layer 4. The syntactic layer 4 analyse the text concerning its grammar and syntax. It can be retrieved that a "Bank" is the subject of the sentence. The "Bank" has the attribute "grϋn". The "Bank" performs the action "stehen" and this is done in the "WaId". This can be extracted by a syntactical and morphological analysis of the text.

The components of the text and their relation to each other can be extracted. This can be used to generate a universal, language-independent representation of the sentence which is shown in Fig. 4 as a graph. This graph can be transformed to English or any other language available. In a first step the pictogram representing the "Bank" is translated to the English word "bench". The attribute of the bench of being "grϋn" is translated to "green", the action "stehen" is translated to "is", and the representation of the "WaId" is tagged with the word "forest". Translating this representation to an English sentence will result in "the green bench is in the forest".

As can be seen from the example, by tagging graphical representation with another language and by putting the words in a grammatically correct order each language can be a source and a target language, respectively.

Fig. 5 represents several possible actions performed by a human being who is shown in the centre. The actions "think", "sit", "walk" and "give" are shown.

Fig. 6 shows sample representations of two attributes. Fig 6a) represents a temperature scale and the corresponding attributes. Generally these representations are fuzzy and cannot refer to a specific value. Warm dishes will be perceived as cold, if they have 10 ⁰C or less. They will be denoted as lukewarm at temperatures of 20 ⁰C. Temperatures of 70 ⁰C will be sensed as hot.

Fig. 6b) depicts a 4-dimensional space for representing emotions. The emotions which can be represented are a superposition of the simplex "fearful", "surprised", "happy" and "angry". An emotion represented here is a point or a region within this 4- dimensional space.

In many embodiments, the relationship, models, syntax requirements, and the like can be part of one or many interworking computer processes. Accordingly, in embodiments of the present disclosure, a computer system employs a language independent object world, thereby providing a central hub for language translations. In an embodiment, text or speech is translated from a source language into a language-independent interpretation before translating this representation into one or more destination languages for output.

For example, the language systems of the present disclosure provide a mapping from the syntax and semantics of an input language to, for example, a graph in the language-independent meaning world. From this language-independent representation, the translation can be completed into any language or many languages. In an embodiment, the language independent graph can also link to and/or be output in a graphical or multimedia presentation. An associated translation process analyzes input text (or speech) regarding its syntax and morphology, extracts contents of the text and their relation relative to each other, generates a graph of said text as a language-independent representation of the meaning of said text, and performs processing of said text using said graph.

Such a system is generally in keeping with findings in the neurological field. One basic finding includes recognition that human cognition separates syntax and semantics. Recall the umbrella discussion from above. This separation of syntax and semantics (or of languages dependent information and languages independent information) is a portion of the translation process of the present disclosure. For example, when text is analyzed regarding its syntax and morphology, the grammatical structure is analyzed. This results in a basic understanding of the text. The contents of that text are extracted. For example, the text generally includes sentences which may comprise a subject, object and verb. In an embodiment, each component can be extracted and its function within the sentence can be retrieved. These components and their relation to each other are used in processing the text into a graph. The components form nodes of the graph and the relation between the components are represented by edges. In an embodiment, this graph can be made to be somewhat or completely independent of the language that is used with the input (or output) text. The graph mainly includes semantic information which can be straightforwardly used for further processing.

During analysis of the text, the system accesses information about grammar of the language used. Each language includes its own specific grammar providing rules as to how words are arranged. Another aspect of this disclosure is to provide a relatively straightforward, non-technical way to generate these grammar rules. To enable users that may have little or no programming knowledge to write grammar rules, grammatical data might be entered through a grammar editor. The grammar rules include a certain formalizations of the possible structures of the given language. By that, time-consuming development of different grammars for every single language can be avoided or reduced, and instead, quick and efficient prototyping is possible. That way, new languages can be integrated into a language processing system as disclosed herein more quickly and easily.

In an embodiment, the text analysis is performed by a parser acting at a syntactic layer of a language processing system. In one aspect of the disclosure a language processing system may be configured modularly to enable reusability, adaptability, and expandability of the system. A parser might perform segmentation and tokenization of the text. Segmentation indicates the determination of the sentential units of the text, where tokenization includes the identification of the concrete word forms within the sentence. After performing segmentation and tokenization, the elements as well as their relationship within the sentence can be analyzed in terms of syntax and morphology.

To improve modularity and to gain a process that can be universally used, the syntactic layer objects might be associated with the language processing system. By doing so, different languages can be easily integrated into the language processing system by adding a new parser and dictionary for each language. As the processing within the system is performed with a language independent representation of the text, any language can be processed. Texts in new languages are transformed to the language independent representation, and then any other existing language pairing can be used for the conversion. Thus, the process can be used rather universally.

Each language docked to the language processing system might be represented in a separate set of syntactic layer objects. Thus, syntactical issues can be configured independently of each other. Furthermore, it is possible that single languages can have common syntactic layer objects, such as a parser or major portions of a parser. For instance High German, Swiss German, and Austrian German have mainly common grammar rules. Only several rules will differ. In this case a single parser may handle each language, having most rules common to the several languages and some rules that are language dependent. This reduces the work of changing rules of the single languages. Language independent information is extracted at this syntactic and/or an optional relational layer. Generally, language independent information comprises objects, actions and attributes and their relations. Objects are usually represented by nouns in languages like German, English or Chinese. Actions are generally described by the verbs of the text. But also adjectives can represent action. For instance two companies can be tagged with "compete" or "competitive." Attributes can be sensory attributes like color, temperature, size or quality as well as attributes like emotions. These objects, actions, and attributes are extracted from the text by the syntactic or relational layer objects and are translated into the meaning world representations (which can be referred to herein as the semantic layer).

At the step of processing the sentence, objects, actions, and attributes of a sentence or phrase of the text are linked together and represented as a graph. The graph facilitates the processing of the text within the language processing system, as graphs can be easily represented as matrices.

To ensure the language independency of the meaning world system, objects, actions, and attributes can be represented based on a unique ID. However, it is preferable that each meaning world representation of a term also have a picture or illustration of the meaning for ease of working with the meaning world (especially by non-programmers). For instance a car can be represented by a pictogram of a car, a bench can be represented by a pictogram of a bench, the attribute "green" can be a green area, and "to give" can be represented by pictograms of a person handing over an object to another person. Thus, the graphs and meaning world objects can be understood, through the graphical presentations without objects, actions, or attributes being tagged with terms of a specific language.

The step of processing the text might comprise the step of reasoning over the extracted semantics of the text. This can be done by comparing the extracted semantics with a meaning world model or determining the distance between the meaning world objects involved. "Distance" in this instance indicates the relative relation between different objects in the meaning world. Closer objects are those that are directly linked or strongly correlated. The more closely tied a set of objects are within the meaning world, the more likely a translation is going to be correct. The meaning world comprises language independent term objects ("LIT objects"). The LIT objects' main tasks are to represent objects which are usually represented by nouns in languages like German, English, or Chinese. It consists of several two- to-n (2-n) dimensional spaces storing the objects (or prototypes of them) and orders them in meaningful combinations. Other parts of speech, such as verbs, can also be represented by the objects.

In one aspect, the disclosure provides systems and processes for providing translation systems. In this case, a graph, or other semantic representation, of input text is language independent, and it can be the basis of the translation to any language. In general the steps of the process include analyzing the original text regarding its syntax and morphology, using the components of the text and their relations to each other to generate a graph of the text as a language-independent representation. After an optional semantic checking, the language-independent representation is transferred to a textual representation in the target language(s). This step of transferring can be performed by the syntactic layer, as this layer already includes syntactical and morphological information of the target language(s). Each language can be theoretically translated in each other language with minimal added complexity for each new language added to the system. Because there is a language independent platform in between, each language just has to be adapted to the language independent representation. Thus, no dictionaries linking the single languages with each other are necessary (unlike the prior art model described above). This facilitates the development of an automatic translation system.

According to another aspect, the disclosure can provide a process to improve searching in a search engine. For example, a user types a question into the webpage of a search engine. This question is analyzed regarding its syntax and morphology, and the components of the text and their relations are extracted. This information is used to generate an internal graph of the question. Using the language-independent meaning world model, ambiguities can be determined and resolved. Further, by departing from a string-based approach to search queries, single words of the question can be generalized using abstractions like in the structure trees and relations among words. Thus, the quality of the results can be improved. In yet another aspect, the disclosure provides a process to analyze a text and retrieve information about, for example, the topic of the text. This can be used to categorize a text automatically. Further it can be used to find logical chains or information about semantic structures in the text.

With each embodiment the text generated at the step of processing might be output to the user as written or spoken language or as depiction. If the step of processing comprises the step of analyzing the text, the output can also comprise statistics or a list of topics or input for searching or other processing.

For improving and facilitating the building of the data structures, databases, and representations used within the system, the knowledge used at the steps of the process according to the invention might be input via a web interface. The knowledge might include lexicon tags, content of the meaning world, grammar information, attribute representation or the like. This information can be input by an open group of users who enter the information through a user friendly interface, rather than a programming-type interface.

To facilitate a complete understanding of the invention, the remainder of the detailed description describes the invention with reference to the figures, wherein like elements are referenced with like numerals throughout.

In contrast to the drawbacks associated with Fig. 1 and 2, Fig. 7 illustrates an exemplary flow chart of a meaning world translation process 300 of embodiments of the present disclosure. As shown in Fig. 7, at Block 310, text of any length, for example a sentence or paragraph, is entered into an electronic translation system according to embodiments disclosed herein. For example, "The boy is running to the park" could be entered at Block 310. At Block 312, the system parses the sentence to extract the root form of the key concepts of the text. Typically, this will be at least the subject, verb, and sometimes object of the sentence. In the illustration, there are three key terms: (1 ) boy; (2) run; and (3) park. These terms are translated into a language-independent "meaning world" graph (Block 314). In an embodiment, the graph includes a node for each of the key concept terms and an edge to illustrate their connections to the other terms. The key concepts are translated into the chosen language (Block 316). In the example, the destination language is German: (1 ) Junge; (2) laufen; and (3) Park. A language-specific parser module reforms the sentence with the proper articles, verb forms, and the like (Block 318), and the completed sentence "Der Junge lauft zum Park" is output to the user (Block 320).

Although a simplified example, the process 300 shown in Fig. 7 illustrates the basic concepts of a meaning world and it graphical properties. An artisan will recognize from the disclosure herein many complicating and challenging natural language input scenarios, and as discussed in the following, the meaning world representation provides significant flexibility and power to solve such scenarios.

Fig. 8 illustrates an exemplary conceptual diagram of language pairs of the present disclosure. For example, as shown in Fig. 8, the four languages of Fig. 2 use four(4) language pairs as opposed to six (6). In addition, the inclusion of an addition language, Italian, uses an additional language pair. Thus, the difference at five languages is between five (5) language pairs for the present disclosure and ten (10) for that for Fig. 2.

Thus, as indicated by Fig. 8, each language ties to the central meaning world, rather than any other specific language. This provides a modular approach to the translation system, as no language must be tied to any other language individually. Moreover, systems and processes according to this disclosure result in a significantly less complex system that is also generally much less costly to develop than the prior art systems.

Embodiments of the present system could, for example, be particularly useful in an international internet chat or instant messaging session. An embodiment of the disclosed system could be incorporated into a back end instant messaging system, and each message could be translated into the preferred language of the individual end users during transmission of the messages. There are numerous other applications for the translation system of this disclosure which will be discussed in more detail below. Embodiments of the translation system are characterized by a modular design for multilingual natural language processing and multimodal interaction. Modules dedicated to different languages and others that are language independent can be combined into a working system able to analyze, reason about, search in, translate, and generate natural language. Embodiments of the system handle multimodal interaction: written and spoken natural language input and output, as well as output as language, speech, depiction or a combination of them. The modules preferably are designed in a way to be reusable by other programs as well. Where possible, the modules are language independent, thus aiding in reusability. Well-defined interfaces and general interfacing programs manage the communication between the system components. By this design, every language can be translated into every other language. Languages to be translated also can be variations within a single language, e.g. Swiss German is translated into High German, or colloquial style is translated into formal style. Features of various embodiments may advantageously include some or all of the following: modularity: easy to handle, reusable, configurable; web-based: accessible from everywhere; ergonomically highly sophisticated software: usable by everyone; community-based: extendable by everyone; universality: every language can be integrated; and visual-graphical core: language independent and cognitively adequate.

Embodiments of this disclosure model and simulate human cognitive processing for the optimization of natural language understanding and generation, translation, search engines, or other communication tasks.

A human cognition-based approach, as here, separates syntax and semantics according to the human brain processes, and it differentiates among the multiple meanings of words. Syntactic rules or language-dependent word forms are handled in specific components. Semantics is processed in a language independent layer, referred to as the meaning world. This approach is based on recent findings in neurological research. As discussed in the foregoing umbrella example, the concept of "umbrella" when communicated is the object tagged with the language specific word. The people involved know the object without using language. If they want to go outside while it is raining, they activate the 'lag" by means of the language-specific dictionary to communicate with other humans: "May I take this umbrella?" or "Kόnnte ich diesen Schirm nehmen?"

This helps explain the advantage of the present approach to language processing in that the meaning is represented in a human way and is therefore language independent. All natural languages can be added, since they are using the same meaning world. Not only can this approach be useful for translation, but it can also aid with many other jobs. In an embodiment of the meaning world, information can be added, processed and stored without special language syntax being necessary. As soon as an information unit is present in the meaning world, new languages can be added very easily by means of binding the syntactical representation to the language independent unit.

Fig. 9 illustrates and exemplary block diagram of an embodiment of a meaning world translation system. For example, Fig. 9 includes system components or modules that can be used to achieve the language processing and translation system disclosed. The meaning world system 522 includes the language independent representations of concepts. The meaning world system 522 also provides multimedia access to certain users to visualize or hear representations of terms and concepts that are stored within it. As illustrated, each language semantic system 524 is tied to the central meaning world 522. This is accomplished by one or more linguistic toolkits 526. The language semantic systems 524 also comprise one or more language dictionaries 528. Each language represented within the translation system will typically have its own dictionary 528 to provide the specific terms of that language. The dictionary entries are tied to specific objects in the meaning world system 522. In some instances, however, languages may be sufficiently related to be able to share all or parts of a linguistic toolkit 526. For example, various dialects may be represented with different languages, but generally follow similar syntactic rules, such as sentence structure and word order. In this case, a single linguistic toolkit may handle the parsing of each language with some, all, or substantially all grammar rules shared. In an embodiment, a linguistic toolkit provides a parser 530 for extracting terms from sentences to be translated, as well as, formulating grammatical sentences from object world graphs. The parser 530 relies on grammar rules 532, inflection classes 534, templates 536, and the like to properly construct and destruct sentences in the associated language. The linguistic tool templates 536 help provide straightforward expansion of the terms in a language dictionary for building or modifying a language within the system. For example, the templates can provide sentence fragments that will help classify new terms properly. More specifically, if a user wishes to add the word "tiger" to the language dictionary, for example, he or she may be presented with templates to help the system understand the parts of speech or a frame of reference. A very straightforward example of a set of templates may be "A tiger," "I tiger," and "the tiger ball." The user can select which one applies, and the system can learn to classify the new term. In this case, the system learns that that "tiger" is a noun that can take an indefinite article, rather than a verb or an adjective, respectively. Similarly, the system may present templates to determine if a verb follows a regular or irregular conjugation. By this process, the system can be extended by everyone without linguistic knowledge being necessary or knowledge about the other languages of the system.

As described above, the parser 530 is a component for translating to and from the meaning world graphs. In another embodiment, however, a relational processor 527 connects the semantic system 524 and parser 530 to the meaning world system 522. In such an embodiment, the semantic system may generate a graph of input text that is still tied to the source language. This graph may be further abstracted into its language independent form by the relational processor 527. The relational processor 527 may extract verb tense, prepositional phrase information, and other sentence details to help organize or augment the language independent graph. For example, the relational processor may indicate "definite article," "continuous form," or "directional information" for the example illustrated in Fig. 7. In various embodiments, as will be easily understood by one of skill in the art, the parser 530 and relational processor 527 can be one or multiple modules, act together simultaneously or separately in sequence, and can share responsibilities in any of a number of ways apart from those described herein. Those of skill in the art will also recognize from the disclosure herein that of other configurations that will provide the same or substantially similar functionality. Turning to Fig. 10, an embodiment of the data structures representing the meaning world 522 is illustrated. In general, each LIT object 638 represents a language independent specific term, such as the terms "building," "room," "city," "house," and "office building" shown in the figure. In an embodiment, each object is a data structure including an object ID 640, a set of one or more relational links 644 and an optional set of one or more hierarchy links 646. The object ID 640 can be a number or code that identifies the computer record in the computer storing the object, but would generally be unrecognizable to a user. In an embodiment LIT objects 638 tie to other related terms using relational links 644. As illustrated, "city" and "building" are related because a city includes a number of buildings; likewise, a "building" is made up of a number of "rooms" so those two objects are linked. In an embodiment, these relational links 644 may be weighted to indicate stronger or weaker ties. Similarly, objects that are related in a class-type-sub-type kind of relationship can be linked by hierarchy links 646 and may form a type of tree structure. In Fig. 10, this relationship is illustrated by the "building," "house," and "office building" objects. "Building" is a generic term that encompasses both "house" and "office building" as more specific building types. Although not pictured, "house" itself could then be linked to sub-types "cottage," "ranch," and "townhouse," for example.

LIT objects 638 can also include dictionary links 648. Moreover, dictionary objects 528 include links from the specific language terms 650 to the appropriate LIT objects 638. For example, Fig. 10 illustrates that the term "Batiment" from the French language dictionary object 526, the term "building" from the English language dictionary and other included languages would link to the "building" object 638. Similarly, the terms "office building" in English, "Bϋrogebaude" in German, and "immeuble de bureaux" in French would be linked to the "office building" object 638.

Each object may also include or link to one or more media representations, such as a visual representation 642. The visual representations 642 can be used to illustrate the associated term in a variety of situations. It is particularly useful for users who are helping to add a new language to the system, as it can be displayed for the user to recognize what term they should link to it with the new language dictionary 528. In some embodiments, audio files, video files, picture files, and the like can all be used as associated media representations. For example, 'Io whistle" may be better associated with a sound file or a sound file and a picture rather than with just a visual representation.

Attributes of objects can also be linked in the object world system 522, and may have special relational ties. For example, an attribute may be an emotional scale, a color representation, or physical attributes like temperature, size, or quality. Relational ties may allow specific terms to be placed along a scale, so that related terminology can be tied to specific or relative values along the scale. For example, "tiny," "small," "regular," "large," "huge," "enormous," and "infinite" might all fall along a size scale. The attribute space itself can be multidimensional. Attributes can also be represented in a structure tree, e.g. "scarlet," "carmine," and "crimson" are subtypes of "red." That way the units of the meaning world are connected to each other in multiple ways in a network allowing complex deductions necessary for processing natural language.

Moreover, in an embodiment, the meaning world system 522 can be represented as a virtual world or a set of virtual worlds. For example, a user interface may be provided that would allow a user to walk through a virtual representation of a meaning world system 522. A user may, for example, first see a "city" made up of "buildings" and be able to zoom into any specific building, such as a "house" or "office building." From there, the user may be able to walk into a "room" of the house, and each room may have objects representing other terms such as "couch," "chair," "bed," "table," and the like. Each object can also be tagged to display the language representations of the terms that are linked to that LIT object 638's dictionary links 648. The virtual world can also include representations of people and actions, as well as modifying attributes. Therefore, navigating to a "door" object in a virtual world may display not only a "door" English tag, but also a "red" color tag, a "wooden" tag, and the like.

In another embodiment, multiple "worlds" may exist with links between them. Straightforward objects can open up new worlds. For example, a room scene may depict a window with the moon showing outside. Clicking on the moon could then lead to another object world that is space oriented. A representation of a human being can lead to an object world that simulates the cellular basis or body parts of human beings. If a user navigates to an office in an office building, there may be a representation of papers on a desk, navigating to the papers (such as by clicking on them with a curser and a mouse input tool, for example) may also open up a tree interface to show the object 638 connected through the hierarchy links 646. For example, "paper" may be related to "advertisement," "report," "periodical," and the like. In turn "periodical" may link to "newspaper" and "magazine" and so on.

Navigation through such a world can be a useful learning tool in and of itself because a user can elect to view language tags in whatever languages are connected to the meaning world. In an embodiment, a user may elect to view tags for a language that they wish to know in order to help them learn that language. Similarly, in an embodiment, terms in a user's primary language and another language may be displayed so that the user can correlate the two along with the visual representation.

Various relations between objects 638 in the meaning world 522 can also be modeled graphically. Spatial, temporal, causal, or metaphorical relations between meaning world objects 638 (and also the other types of relations) are ideally suited for a graphical depiction. For translation, these kinds of relations are the base for determining by which structures and wordings they are to be expressed verbally because languages differ in the way they express these relations: some languages use prepositions, others realize them as morphemes attached to a noun, etc. The best way to generate adequate structures and wordings is based on a neutral, abstract, and graphical representation. By this process, the generation component doesn't need to do a complex restructuring of the input structure (as is done by classical machine translation systems), but simply chooses between the available structures of the target language using a mapping from relations to structures.

Returning to the translation process of Fig. 7, additional detail can now be explained with respect to the meaning world system 522. When selecting proper translations, knowledge about the topics involved improves the translation by filtering ambiguous meanings that do not belong to these topics. The topics can often be discerned from the relations of the translation text's graphical representation (see Block 314). For several topics, there will be many clusters in the N-dimensional semantic space. Efficient and fast clustering algorithms are used to find the cluster centers like the k- means clustering algorithm. These Cluster centroids represent the topics of the text. If there are ambiguous translations, the topic can be used to resolve them. For example, an input text may include, 'The dog was a Siberian Husky." The term "dog" actually has multiple meanings including "a domesticated canine," "a despicable man," or "an iron bar driven into a stone or timber to provide a means of lifting it." Each of these definitions may have different translations in other languages and are therefore also ambiguous to the object world system 522. However, the context of the sentence can help select the correct object world object to use (the one that corresponds to "a domesticated canine") because the other objects of the sentence, in particular "Siberian Husky," would be more closely tied in the object meaning world with that object than with the others. Conceptually, "Siberian Husky" and the proper "dog" object would both appear in an animal or pet-related subset of the object world, for example.

The syntactic analysis of the text often creates many syntax graphs and some unresolved connections between graph nodes, as in the above "dog" example. A statistical approach to choose the best graph is generally used to distinguish between ambiguities: Bayes' theorem, in an embodiment. Bayes' theorem states that the probability of a certain graph given the evidence (the semantic entities) is proportional to the likelihood of the semantic entities to be in that graph times the prior probability of that entity being in that graph. Other algorithms and statistical analysis — known as or derived from standard statistical principals — can also be used to help disambiguate translations from a source language to the LIT object interpretation and will be known to those of skill in the art.

One aspect of an embodiment of the language processing system disclosed is a set of tools that can help a user edit languages or add new ones altogether. As already explained, users can navigate through a virtual world that helps represent terms included in the language independent meaning world. It would be of great use to allow multiple interested parties to help add new language terms, correct misused terms, and even add new languages to the meaning world. While all of this can be done by skilled programmers, it would be quicker and easier to allow the collective abilities of a great number of users grow and correct the meaning world. This type of group work has already been illustrated with the "wiki-" movement and websites such as "Wikipedia." In an embodiment, selected, qualified users, such as linguists, language professors, and the like, would be allowed to add languages or edit existing ones; in another, any interested user may be allowed to add or edit a language.

In an embodiment, the system creates a certain formalization of possible structures of the language. By that, time-consuming development of different grammars for every single language can be avoided and instead, quick and efficient prototyping is possible. In that way, new languages can be plugged in quickly and easily. The grammars are used by both the language analysis and generation components.

Additionally, graphical user interfaces, which may be called Lexi-Wikis, allow users to enter words into the language-specific dictionaries 528. Lexi-Wikis are designed to be usable by everybody. From the respective words, the tools generate example sentences to be simply selected or modified by the user. Which forms and how many word forms have to be presented to the user is determined by different language- specific inflectional algorithms. The user-selected examples are translated into a complex representation which can be processed by the program. In an embodiment, the underlying morphological process uses linguistic knowledge and frequency information to determine the minimum of information the user has to provide. It thus anticipates the most probable word forms so that as few as possible word forms and as few as possible actions are required from the user. By this process, the mental load or the intelligence is transferred from the user side to the software side.

Very often users will not be able to position semantic entities absolutely, but they will very well be capable of telling the dissimilarity to other semantic entities. Multidimensional scaling is employed, an algorithm designed to place multidimensional points based on a dissimilarity matrix, that is a matrix containing the distances (or dissimilarity) to other semantic entities. These algorithms can be fuzzy, which is necessary, as no two people will choose exactly the same distance. They will rather have a consensus of a general strength (as in "far away" or "very close to").

Turning to Fig. 11 , a process of adding a language to a meaning world system is described. In Block 760, a user logs on to the system. In some embodiments, the user may elect to log into a specific "user modification" mode, which can help prevent unauthorized or inadvertent alterations to the system. From a menu, the user may elect to add a new language (Block 762). In creating a new language, a syntax parser must be created to deconstruct and generate sentences. In an embodiment, the system includes template rules that the user can select when applicable (Block 764). For example, a rule may indicate that adjectives typically modify a noun coming after them (e.g., English) or that they typically modify a noun coming before them (e.g., French). Modifying rules can explain exceptions to this and the like. Once a parser is created, terms can be added to the new language's dictionary (Block 766). Each term is linked to meaning world objects (Block 768).

Linking new terms can be done in any number of ways, including the processes described with reference to Fig. 12A and 12B. In one embodiment, a user logs into the system (Block 670) and chooses to add a term to a specific language dictionary (Block 872), such as through a menu system. The user can enter the term (Block 874). The system may provide template questions to help provide proper usage context (Block 878). For example, templates can help the system categorize the terms by part of speech, regular or irregular conjugation of verbs, and the like. The responses may also help provide a specific meaning world context to help direct the user to the correct meaning world or area of the meaning world in which the term's language independent object resides. The user can also browse the virtual meaning world (Block 878) and select the virtual representation to which the new dictionary entry should be linked (Block 880).

In an alternative embodiment, Fig. 12B illustrates another process by which a user can add words to the dictionaries. As shown in Fig. 12B, the user logs onto the system (Block 882). The user may browse the meaning world (Block 884). Selecting objects in a specific context (Block 886) can display the terms that are linked to that object, such as by showing a pop-up balloon in the virtual world. When a user selects an object, if no term is associated, the user may add a term to "tag" the object (Block 888). Similarly, a user may alter tags to correct or enhance language dictionaries. For example, a meaning world representation of a saxophone may be tagged with the term "instrument" or "musical instrument" in the English dictionary. A user may edit the tag to show the more precise term in the hierarchy by adding "saxophone." Moreover, external resources can be linked into the system so that knowledge representations available on the Internet, in public or private databases, and the like can be used within the components of the language system. Resources to be linked may include, for example, DBpedia, Wiktionary, Open Street Map, scientific taxonomies, ontologies from the Semantic Web®, users' own taxonomies, etc. In an embodiment, consistency check components verify the consistency of different representations and enable correct computing over potentially heterogeneous knowledge sources. Even different media types can be integrated such as graphics, videos, and audio.

Turning to Fig. 13, an embodiment of a basic translation system and means for accessing it are illustrated. Although such a translation system can take many forms a web-based translation system can provide easy access for a large number of interested users. For example, a computing system 994, such as a server, may store some or all of the programming code that when executed produces some or all of the functionality of the meaning world system 522, including the language semantic systems 524. The server 994 may electronically communicate with a public or private, local or wide area network 992, such as, for example, the Internet. In turn, various users can electronically communicate with to the translation computing system using other network-enabled devices 990a, 990b. Suitable user devices include personal computers, laptop computers, data network enabled phones or other mobile devices (such as, for example, Blackberry® devices, Apple iPhone® devices, other PDAs, mobile phones, and the like). Users may access the translation system through a web interface through a browser or through a stand-alone program installed on the user device, in some embodiments.

One user may use a personal computer 990b to access a translation service, input text for translation, select source and destination languages, and receive the appropriate translated text as described with reference to Fig. 7. Meanwhile, another user may edit or add languages to the system through a different interface on their computer 990a. The language translation system is preferably scalable to allow a number of users to access the system at any given time. With this approach, multiple users may attempt to edit languages simultaneously. In a preferred embodiment, the translation system may provide a lockout mechanism to allow only one user to edit a specific dictionary 528 entry or a specific meaning world LIT object 638, for example, at any given time.

Although much of this disclosure focuses on the use of the meaning world and language parsing abilities in order to create a translation system, there are other uses for such a meaning world system. For example, the meaning world system can help enhance search engine capabilities. An embodiment of the system as described herein can parse a natural language search query, such as a sentence or a question. It can extract key terms and generate graphs and/or graphical language independent equivalents. Because these language independent objects 638 are also linked to related LIT objects, a user's straightforward question can be expanded to include similar words, other word forms, semantically related words, and the like.

As an example, a user types a question into the webpage of a search engine: "What recent court decisions define qualifying income tax?" This question is analyzed regarding its syntax and morphology, and the components of the text and their relations are extracted: "court," "decision," "define," "qualifying," and "income tax." This information can then be used to generate a graph of the question, as if a translation were being made. These terms or meaning world objects 638 alone may not provide all topical results, however. As such the object world relational and hierarchy links 644, 646 can be used to expand the search terms. For example, "decision" may link to the term "opinion" and "order." Similarly, "court" may link to "judge," and "income tax" could link to "IRS." These additional terms can then be used to expand the ultimate search. Thus, the quality of the results can be improved without the user having to expand their terminology or perform multiple searches.

Another aspect of this disclosure can be used to analyze a text. It is possible to use the parsers to retrieve the topics presented in a text. This can be used to categorize a text automatically. Further it can be used to find logical chains or information about semantic structures in the text. Continuing the example from above, this text analysis could be used by a web-crawler program that attempts to categorize new web pages for searching purposes. When analyzing text, output can comprise statistics or a list of topics that may be used to tag web-pages for a search query. Similarly, libraries could use a similar system to help categorize new books, periodicals, articles, and the like to generate topical card catalogues and search databases.

Similarly, topic extraction can help expand search queries in a correct environment. For example, "decision" from the above search query could also link to "choice," while "court" could link to "basketball" or "tennis." Expanding a search with these terms would clearly be expanding the search into an improper object world space. Thus determining that the topic is "taxes" and/or "legal" can help a search engine expand terms in an appropriate context.

Further embodiments are possible. As the disclosure offers a language independent representation of a text, the processing of that representation can take many different forms. Thus, the disclosed system can be used in many different applications. Further, various embodiments described herein can be combined as desired.

Fig. 13 illustrates a block diagram of one embodiment of a computing system 994 that may be used to implement certain systems and processes described herein. For example, in one embodiment the computing system 994 may be configured to receive translation requests from another computer system (such as a user PC 990a, 990b), use a meaning world implementation to translate the request into the proper language, and return the translation. The functionality provided for in the components and modules of computing system 994 may be combined into fewer components and modules or further separated into additional components and modules.

The computing system 994 includes, for example, a server or personal computer that is IBM, Macintosh, Linux/Unix compatible, or the like. In one embodiment, the computing device comprises a server, a laptop computer, a cell phone, a personal digital assistant, a kiosk, or an audio player, for example. In one embodiment, the exemplary computing system 994 includes a central processing unit ("CPU") 1095, which may include a conventional microprocessor. The computing system 994 further includes a memory 1097, such as random access memory ("RAM") for temporary storage of information and a read only memory ("ROM") for permanent storage of information, and a mass storage device 1098, such as a hard drive, diskette, or optical media storage device. Typically, the modules of the computing system 994 are connected to the computer using a standards based bus system. In different embodiments, the standards based bus system could be Peripheral Component Interconnect (PCI), MicroChannel, SCSI, Industrial Standard Architecture (ISA) and Extended ISA (EISA) architectures, for example.

The computing system 994 is generally controlled and coordinated by operating system software, such as Windows 95, Windows 98, Windows NT, Windows 2000, Windows XP, Windows Vista, Linux, SunOS, Solaris, or other compatible operating systems. In Macintosh systems, the operating system may be any available operating system, such as MAC OS X. In other embodiments, the computing system 994 may be controlled by a proprietary operating system. Conventional operating systems control and schedule computer processes for execution, perform memory management, provide file system, networking, and I/O services, and provide a user interface, such as a graphical user interface ("GUI"), among other things.

The exemplary computing system 994 includes one or more commonly available input/output (I/O) devices and interfaces 1096, such as a keyboard, mouse, touchpad, modem, Ethernet card, microphone, and/or printer. In one embodiment, the I/O devices and interfaces 1096 include one or more display devices, such as a monitor, that allows the visual presentation of data to a user. More particularly, a display device provides for the presentation of GUIs, application software data, and multimedia presentations, for example. The computing system 994 may also include one or more multimedia devices 1099, such as speakers, video cards, graphics accelerators, and microphones, for example. In an embodiment, a user enters text to be translated or processed through a keyboard or touchpad representation of a keyboard (input devices 1096). In another, a microphone (another input device 1096) accepts spoken text. The spoken text may be stored in any of a number of audio formats, such as for example, WAV, MP3, or other formats. The CPU 1095 may process this audio text and convert it to written text, such as a string data object, a plain text data file, a Microsoft® Word document, or the like.

In the embodiment of Fig. 13, the I/O devices and interfaces 1096 provide a communication interface to various external devices. In an embodiment, the computing system 994 is coupled to a network 992, such as a LAN, WAN, or the Internet, for example, (see Fig. 13) via a wired, wireless, or combination of wired and wireless, communication link. The network 992 communicates with various computing devices and/or other electronic devices via wired or wireless communication links. In the exemplary embodiment of Fig. 13, the network 992 is coupled to one or more user terminals or computing devices 990a, 990b. Computing device 990b can communicate the text input, in audio or written text formats, to computing system 994 for processing. In addition to the devices that are illustrated in Fig. 13, the network 992 may communicate with other data sources or other computing devices. In addition, the data sources may include one or more internal and/or external data sources. In some embodiments, one or more of the databases or data sources may be implemented using a relational database, such as Sybase, Oracle, CodeBase and Microsoft® SQL Server as well as other types of databases such as, for example, a flat file database, an entity-relationship database, and object- oriented database, and/or a record-based database.

In the embodiment of Fig. 14, the computing system 994 also includes an application module that may be executed by the CPU 1095. In the embodiment of Fig. 13, the application module manages the meaning world models and data. This module may include, by way of example, components, such as software components, object- oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.

In general, the word "module," as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, Lua, C or C++. A software module may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software modules may be callable from other modules or from themselves, and/or may be invoked in response to detected events or interrupts. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware modules may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors. The modules described herein are preferably implemented as software modules, but may be represented in hardware or firmware. Generally, the modules described herein refer to logical modules that may be combined with other modules or divided into sub-modules despite their physical organization or storage.

Referring to another exemplary embodiment of the invention some basic features of the invention are summarized in the following text. It should be regarded as an example which supports the understanding of the teaching of the invention.

In the following the language processing system will be referred to as "Lingupedia" which is a brand of the Lingupedia Investments S a r i, Luxemburg. Lingupedia is a modular system for automated translation of text.

Lingupedia employs a completely modular design for multilingual Natural Language Processing and multimodal interaction. Modules of any type can be combined into a working system being able to analyse, reason about, search in, translate and generate natural language. The system handles multimodal interaction: written and spoken natural language input and output, as well as output as language, speech, depiction or a combination of them. The modules are designed in a way to be reusable by different other programs, either within the Lingupedia system, e.g. by both analysis and generation, or by other software programs. Where possible, the modules are language independent, thus ensuring reusability. Well-defined interfaces and general interfacing programs manage the communication between the system components. By this design, every language can be translated into every other language. Languages to be translated can be even variations within a single language, e.g. Swiss German is translated into High German, or colloquial style is translated into formal style. The core features of Lingupedia are:

• modular: easy to handle, reusable, configurable

• web-based: accessible from everywhere

• software-ergonomically highly sophisticated: usable by everyone

• community-based: extendable by everyone

• universal: every language can be integrated • visual-graphical core: language independent and cognitively adequate

The central idea of the Lingupedia system is an approach to model and simulate human cognitive processing for the optimization of natural language understanding and generation, translation, search engines, or other communication tasks.

Most algorithms are based on an orthographical form which means just a symbolism or a byte string without any meaning. Even ontologies use this approach ,,house is-a building", sometimes with mathematical distances or spaces, but they always use this meaningless byte chains. The byte chains have the major disadvantage that they often do have multiple meanings which might comprise no meaning at all: a dog may be a pet, a grab hook, a cramp iron,...

A human cognition based approach like Lingupedia clearly separates syntax and semantics according to the human brain processes and it differentiates among the multiple meanings of words. Syntactic rules or language-dependent word forms are handled in specific components. Semantics is processed in a language independent layer, the Lingupedia Meaning World (LMW). This approach is based on recent findings in neurological research. An example: If several people with different languages are sitting together and there is an umbrella in this room everyone "knows" that this is an umbrella. But this "knowing" does not mean that the word "umbrella" is activated in any way in the brain of a person present. Only for communication purposes the object "umbrella" is tagged with the language specific word. The people involved know the object without using language. If they want to go outside while it is raining, they activate the "tag" by means of the language-specific dictionary but only to communicate with other humans: "May I take this umbrella?" or "Kόnnte ich diesen Schirm nehmen?"

The advantage of the Lingupedia approach is that the meaning is represented in a human way and is therefore language independent. Thus all natural languages can be added, since they are using the same meaning world. Lingupedia's claim is: Not only can translation, but any job or software using natural language be optimized by our approach. In the core component LMW, information can be added, processed and stored without special language syntax being necessary. As soon as an information unit is present in LMW, new languages can be added very easily by means of binding the syntactical representation to the language independent unit.

Also users' meanings can be stored language independently: E.g. a company which produces a special printer can derive this printer from a given printer template in LMW, adapt the derived printer with the specific parts and tag it with one or more languages. Images with parts' and features' descriptions can be easily derived from LMW for a defined language. Thus multilingual product information (like documentation, marketing information, or error reports) can be generated automatically from the language independent meaning world. Customer communication in various languages and forms (Emails, letters, phone calls) can be automated, i.e. analysed, interpreted, distributed to different departments and generated for response to the customer.

Besides this CRM (Customer Relationship Management) application, LMW is usable as a fast and efficient information search machine because, in terms of mental representation, it is closer to human knowledge representation than other approaches. Linupedia's method is both superior to classical string based search (requiring exact matches on the level of orthographic form) and recent semantic web search (requiring a special annotation of texts in which information is to be searched).

A core method within LMW is the tagging method. To simplify the navigation within LMW, the language specific so called tagging can be activated. If e.g. English tagging is activated and the user navigates to an umbrella, the tagging algorithm queries the English dictionary for an entry, displaying it for the user. Thus, users of another language receive assistance in finding the desired information.

Knowledge within the LMW is represented in different worlds. Generally "object world", "Structure trees/networks", "Action space", and "Attribute space is used.

The Object world's main task is to represent objects which are usually represented by nouns in languages like German, English or Chinese. It consists of several two-ton-dimensional spaces storing the objects (or prototypes of them) and orders them in meaningful combinations. These objects are organized in structure trees or networks. Humans organize knowledge about objects of the world and their relations in a meaningful structure. This organization is done in a non-uniform way. They are using concepts and categories for storing and sorting information. Such a grouping of categories can exist for "electronic devices" (computer, printer, digital phone), or "papers" (letter, document, invoice).

The action space part of LMW is responsible for the representation of actions. Actions can be connected to any other unit in LMW, e.g. the unit that is tagged with the English "withdraw" or the German word "abheben" can be bound to the objects "person", "money" and "cashpoint" being the actors involved. Actions don't have to be necessarily verbs: a molecule by means of an action bound e.g. to two companies can be tagged with "compete" or "being competitor". Such connections are called molecules.

The attribute space is structured in a straightforward way, also as far as usability is concerned. Most, if not all attributes can be quantified in some natural manner. Sensory attributes like color, taste, size or pressure already have a 1 to 3- dimensional representation used in various contexts.

Parts of Lingupedia are also the following further representations and algorithms:

• Integration of external resources of knowledge representation

• Depictions for a natural presentation of units in LMW

• Graphical relations between units in LMW

• Clustering for topics in texts

• Statistical analysis for disambiguation

• Multidimensional scaling for computing similarities

Besides the components described above, there are the following parts of Lingupedia designed for the modelling of specific natural languages:

• language independent grammar editor for the definition of grammars for each language

• Lexi-Wikis for the definition of words for each language

• Dictionary with multi-purpose configurability Usually dictionaries don't provide an exact meaning: A dictionary gives the following English-to-German translations for dog: Anschlag, Bauklammer, Finger, Gerϋst- klammer, Greifhaken, Hund, and German-to-english translations for Hund: canine, dog, hound. So, several different meanings are given for a single word. LMW can differentiate between these meanings in a sophisticated way. This means that there are first the language independent meaning representations like a hairy animal, which is English-tagged with the word "dog", or a special part of a gantry which is also tagged with the same orthographic form "dog" (in German "Gerϋstklammer"). Thus, if a usual dictionary has 30.000 entries in English, LMW will need some 100.000 meaning representations. The language independent meaning can be resolved by looking at the context: Is this dog-tagged object used in a building site domain or is it combined with an action tagged with the verb bark or walk? Once the meaning is clear by finding the correct unit in LMW, translation or further processing can be done better than with any existing system.

Lingupedia can a) use and integrate existing external resources from the web and b) open all components of Lingupedia for public access - the language-specific syntactic components as well as the language independent semantic area LMW. Furthermore, right from the start and even more with a growing LMW, it is very easy to integrate the syntactic part of a new language because only a simple tagging has to be done. Lingupedia provides a linguistic toolkit for quick-and-easy tagging by nonexpert users without special knowledge and covering every human language.

In the following some details of the components will be described.

Within an object world, all semantic entities are represented in a language independent way. The representation is graphical, i.e. visualized in different forms. Semantic entities correspond to abstract or real-world objects that live in a sort of "archetypal" world. They are organized in two-to-n-dimensional spaces and in meaningful structures.

Simple objects can open up new worlds. For example, the moon can lead to another space like the orbit. Or the representation of a human being can lead to a space which simulates the cellular basis or body parts of human beings. Relations of objects can be represented for example in a town with buildings, parks, and gardens. A building can be a private, public or an office building. This building contains offices; offices contain objects like desks, computers, shelves, clocks or papers. That way objects are related spatially or functionally to a knowledge domain which is represented by an office or building. The objects might consist of parts; a clock for example can consist of a mechanical mechanism and a display with the parts hour and minute hand. The Euclidean distance of objects in this archetype world represents the dissimilarity between two objects. The Euclidean distance in the semantic space is not equivalent to real world Euclidean distance. It is based on dissimilarity or functional closeness.

LMW uses associative networks or directed trees as a knowledge representation. The user can go from an object, e.g. "document" object lying on the desk of the graphical world to a corresponding tree to find e.g. the object "monition". Every object can be bound to multiple structure trees, e.g. the paper-object to "papers" and also to a tree of "materials" with sister nodes like wood, metal etc.

In LMW, there are different types of relations within networks: One type of a relation can be "is-a". Here, an object is a subtype of the parent node comprising the supertype. Subtypes inherit properties of their supertypes. Multiple inheritance is possible. This "is-a" tree is used for the translation of subtypes which don't have a tagging in the target language. Instead of the specific term, the more general super- type is verbalized ("document" instead of "letter", "take" instead of "withdraw") or a synonym or the negated phrasing of an antonym is chosen. Besides hyponymy, other relations are also used for deduction and translation: closeness, relatedness, instance-of, member-of, frame-relatedness, similar-to, synonymy, antonymy, meronymy, etc. Languages differ in their lexical inventory. This network of relations allows a flexible way of generating natural language within a system which has to handle every language and where the various languages lack certain words either for language-immanent reasons or because they are not yet tagged in the Lingupedia system. Units in LMW can be artificial in the way that they are only part of structure trees. This holds for some relations or constructed nodes. Especially for actions, a visualized representation is cognitively adequate since a verbalized definition is hard to understand and less intuitive for users than a visual one. Movies, graphics displaying motions or schematic depictions are used to illustrate the different actions. The action space is also used for the representation of thematic roles of verbs or other entities. The thematic roles relate an action to its agents, themes, goals etc. The roles are either defined by the user or deducted from the properties of the action displayed graphically. This is an elegant and intuitive way of assigning an internal, thematic structure to actions and events. This knowledge about the roles involved is used for disambiguation and for correct generation of the target sentence.

Attributes of objects can be an emotional scale, a colour representation or physical attributes like temperature, size or quality. A stock e.g. can be bound to a two- dimensional space that represents a number scale representing currency units. Other units like actions can be bound to this space which may be tagged with "raise" or "drop". Like the action space, the attribute space can be connected to other units in LMW. A colour can be bound to an object which is tagged with "car". The attribute space itself can be multidimensional. Attributes can represent a structure tree, e.g. "scarlet", "carmine", and "crimson" are subtypes of "red". That way the units of the meaning world are connected to each other in multiple ways in a network allowing complex deductions necessary for processing natural language.

External resources can be linked into the system so that knowledge representations available on the Internet can be used within the components of the system. Resources to be linked are for example DBpedia, Wiktionary, Open Street Map, scientific taxonomies, ontologies from the Semantic Web^®, users' own taxonomies, etc. Highly sophisticated consistency check components verify the consistency of the different representations and enable correct computing over the entirety of the heterogenous knowledge sources. Even different media types can be integrated such as graphics, videos, and audio. Different interpretation or translation algorithms allow handling various types of representations.

Avatars represent human beings or animals. Avatars are - like all objects in LMW - derived from others. Thus an inherent hierarchy by means of derived objects is given. The thinking of human beings works - this is an hypothesis of the authors - also in finite world simulations: If humans imagine withdrawing money from a cashpoint they don't use words "I, bank, cashpoint, withdraw". Instead, they use a language independent "mental image" or "mental scene" to imagine the process. They even can simulate whole stories - e.g. in dreams - without using their body. They imagine their body in an artificial brain simulated environment. Later on, the LMW shall serve as a platform for such simulations by means of artificial intelligence.

Various relations are also modelled graphically. Spatial, temporal, causal or metaphorical relations between entities (and also the other types of relations) are ideally suited for a graphical depiction. For translation, these kinds of relations are the base for determining by which structures and wordings they are to be expressed verbally because languages differ in the way they express these relations: some languages use prepositions, others realize them as morphemes attached to a noun, etc. Based on a neutral, abstract and graphical representation is the best way to generate adequate structures and wordings, and is cognitively adequate. By this method, the generation component doesn't need to do a complex restructuring of the input structure (as is done by classical machine translation systems), but simply chooses between the available structures of the target language using a mapping from relations to structures. These mapping algorithms have been developed for the generation of every language to be integrated.

Knowledge about topics improves the translation by filtering ambiguous meanings that do not belong to these topics. For several topics, there will be many clusters in the N-dimensional semantic space. Efficient and fast clustering algorithms are used to find the cluster centers like the k-means clustering algorithm. These Cluster centroids represent the topics of the text. If there are ambiguous translations, the topic can be used to resolve them.

The syntactic analysis of the text often creates many syntax graphs and some unresolved connections between graph nodes. A statistical approach to choose the best graph is used: Bayes' theorem. It states that the probability of a certain graph given the evidence (the semantic entities) is proportional to the likelihood of the semantic entities to be in that graph times the prior probability of that entity being in that graph.

Part of the Lingupedia system is the world's first language independent grammar editor: users without any programming knowledge can write grammars. Only a certain formalization of the possible structures of the language at hand is required. By that, time-consuming development of different grammars for every single language can be avoided and instead, quick and efficient prototyping is possible. That way new languages can be plugged in quickly and easily. The grammars are used by both the language analysis and generation components. This concept of modularity and reusability of components is applied to the following syntactic representations and processes:

• language independent, i.e. universal, abstract representation of grammatical structure

• grammars for analysis and generation

• syntactic-morphological rules for analysis and generation

Graphical user interfaces, called Lexi-Wikis, allow users to enter words into the language-specific lexicon. Lexi-Wikis do not require any expert knowledge about the language at hand, but are designed to be usable by everybody. From the respective words, the tools generate example sentences to be simply selected or modified by the user. Which forms and how many word forms have to be presented to the user is determined by different language-specific inflectional algorithms. The user-selected examples are translated into a complex representation which can be processed by the program. The underlying morphological method uses linguistic knowledge and frequency information to determine the minimum of information the user has to provide. It thus anticipates the most probable word forms so that as few as possible word forms and as few as possible actions are required from the user. By this method, the mental load or the intelligence is transferred from the user side to the software side.

The dictionary method is designed as a universal, multi-purpose master dictionary for all sorts of natural language applications and for all types of languages. The dictionary proposes a new level of representation: the phrase level which is settled between single words and complete sentences. So, the units of a language can be handled in a very flexible way on a continuum word - phrase - sentence. Multiword expressions, so far a major problem for most natural language systems, can be represented in a more or less fixed structure: from being not-varying and not- modified at all (having a fixed form and no internal structure), to having an internal structure with certain restrictions (semantic, syntactic, lexical, pragmatic, stylistic etc.), up to being open for modifications of any type.

The dictionary method provides a mechanism for annotating the entries with features usable for various natural language applications: morphological features for morphological analysis and generation, syntactic features for syntactic analysis and generation, semantic features for semantic processing, pragmatic features for pragmatic processing, and dialogue-related features for the efficient design of natural language dialogues. To explain the feature-based method: processing natural language by using their surface forms (the string) is not ideal since every variation and equivalent or related form has to be handled separately. This approach is not efficient: laborious and error-prone for the programmer and giving the user no flexibility in interacting with the software e.g. in a dialogue: he/she has to use the exact strings the software is prepared for; otherwise he/she is not understood at all. By using features a higher level of scientific abstraction is employed, resulting in a more flexible and more natural way of interaction.

Besides linguistic information used for written language interaction, the dictionary also stores information about the pronunciation of words which is usable for audio input and output that is both speech recognition and synthesis. Conversion algorithms are integrated. They translate the internal form of the pronunciation representation into another one to be further processed by various types of software or to be presented to the user. Thus, this information is usable for different applications in a flexible way. A configuration tool allows selecting exactly the parts of the dictionary which are needed by different applications.

By storing base forms instead of full forms (the latter being common for speech- related software), the dictionary employs a both efficient and flexible form of representation and processing and allows the dynamical generation of all possible inflectional, derivational and compound forms. A generation algorithm which produces the different word forms while ensuring the correct pronunciation derived from and adapted to the internal structure of words is part of the system. The dictionary also provides a method for representing various relations between lexicon entries. The relations refer to different language-processing tasks, e.g. for an abbreviation which is normally not used in speech but in written language. If it is to be used in speech synthesis, its full form is represented to make it pronouncable. Or, if an entry is to be found by a search engine, its various orthographic and inflectional forms are irrelevant for the job of searching, although until now they had to be represented explicitly. With Lingupedia's approach they are related and can easily be found.

Many modifications and other embodiment of the invention set forth herein will come to mind the one skilled in the art to which the invention pertains having the benefit of the teaching presented in the foregoing description and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

C l a i m s

1. Method for processing natural language using a language processing system, wherein written or spoken text is input to said language processing system, c h a r a c t e r i z e d b y the steps of analyzing said text regarding its syntax and morphology, extracting components of the text and their relation relative each other, generating or using a graph or graphical representation of said text as language independent representation of the meaning of said text, and performing processing of said text using said graph or graphical representation.

2. Method according to claim 1 , wherein said text is modeled in a visual-graphical way, wherein the visual-graphic model is language-independent, thus enabling users to extend the language processing system without having knowledge about the languages involved.

3. Method according to claim 1 or 2, wherein grammatical data used at the step of analyzing is entered to the language processing system by means of a language- independent grammar editor.

4. Method according to any one of claims 1 to 3, wherein the step of analyzing is performed by a syntactic layer of the language processing system which performs segmentation and tokenization of said text, wherein the syntactic layer can be docked to the language processing system.

5. Method according to claim 4, wherein each language which should be processed by the language processing system is represented in a separate syntactic layer, whereby abstractions of other languages can be re-used within the single syntactic layers.

6. Method according to claim 4 or 5, wherein further abstraction and generalization of data generated by the syntactic layer is performed by a relational layer, wherein said data preferably describes relations between objects and abstractions of that.

7. Method according to any one of claims 4 to 6, wherein language independent information of said text is extracted at said syntactic and relational layers, wherein the language independent information is sent to a semantic layer, and wherein the language independent information comprises objects, actions, and attributes.

8. Method according to any one of claims 1 to 7, wherein at the step of generating the graph or graphical representation, objects, actions, and attributes of a sentence or phrase are linked together.

9. Method according to claim 8, wherein the objects, actions, and attributes are represented graphically.

10. Method according to any one of claims 1 to 9, wherein the step of processing comprises the step of reasoning over the information within a meaning world model, thereby checking the extracted semantics of said text for consistency.

11. Method according to any one of claims 1 to 10, wherein the step of processing comprises the step of generating a translation of said text in a language different to the original language of said text, wherein said graph or graphical representation is the basis of the translation.

12. Method according to any one of claims 1 to 11 , wherein the step of processing comprises the step of analyzing said text for purpose of searching or other language processing tasks.

13. Method according to any one of claims 10 to 12, wherein the step of processing comprises the step of generating a response to said text using information given in the meaning world model.

14. Method according to any one of claims 1 to 13, wherein the text generated at the step of processing is output to the user as written or spoken language or as depiction.

15. Method according to any one of claims 1 to 14, wherein knowledge used at the single steps is input using a web interface designed for usability by everyone, wherein the knowledge may include lexicon tags, content of a meaning world model, grammar information, and attribute representation.

16. System for processing natural language comprising: a language-independent module, wherein the module manipulates a plurality of objects representing terms and relationships between the objects; a plurality of language-dependent dictionary modules, each dictionary module having a plurality of entries, each entry of the dictionaries linked to one of the plurality of objects stored in the language-independent module; a text parser associated with one or more of the language-dependent dictionary modules; and a sentence generator associated with one or more of the language-dependent dictionary, wherein the text parser accepts input, extracts key terms from the input, and uses a graph representation of the key terms based on the plurality of objects from the language-independent core module and wherein the sentence generator formulates output text in one of its associated languages based on the graph representation.

17. System according to claim 16, wherein the input comprises written text or oral text.

18. System according to claim 16, comprising at least one linguistic syntactic module associated with one or more of the language-dependent dictionaries, the linguistic syntactic module including the text parser, a set of grammar rules, and a set of templates.

19. System according to claim 18, wherein each language-dependent dictionary is associated with a different linguistic syntactic module.

20. System according to claim 18, wherein the language-dependent dictionaries of at least two closely related languages are associated with the same linguistic syntactic module.

21. System according to claim 16, wherein the language-independent module further stores media representations of associated terms stored in the module.

22. System according to claim 21 , wherein the media representations comprise pictures, sounds, or videos.

23. System according to claim 21 , further comprising an editing component module, wherein the editing component module facilitates the alteration of the dictionary entries and their links to the one of the plurality of objects stored in the language-independent module.

24. System according to claim 23, wherein the editing component module is further adapted to facilitate adding entries to the plurality of language-dependent dictionaries.

25. System according to claim 23, wherein the editing component module is adapted to be accessible from a website.

26. System according to claim 25, wherein the media representations of terms can be displayed within a virtual world.

27. System according to claim 23, wherein the editing component module restricts access to qualified users.

28. A method of developing a language processing system comprising: developing a language-independent core, the core comprising language term objects, each language term object comprising media representations of language terms, and links between associated language terms; adding a dictionary object, associated with a specific language; adding words from the specific language to the dictionary object; and linking the words to the appropriate core language term objects.

29. Method according to claim 28, wherein the links between associated language terms include relational links and hierarchy links.

30. Method according to claim 28 comprising creating a language parser for the specific language based on grammar and syntax rules.