EP1397754A1 - Querideenassoziationsdatenbankerzeugung - Google Patents

Querideenassoziationsdatenbankerzeugung

Info

Publication number
EP1397754A1
EP1397754A1 EP02744486A EP02744486A EP1397754A1 EP 1397754 A1 EP1397754 A1 EP 1397754A1 EP 02744486 A EP02744486 A EP 02744486A EP 02744486 A EP02744486 A EP 02744486A EP 1397754 A1 EP1397754 A1 EP 1397754A1
Authority
EP
European Patent Office
Prior art keywords
word
document
words
ofthe
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP02744486A
Other languages
English (en)
French (fr)
Other versions
EP1397754A4 (de
Inventor
Eli Abir
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of EP1397754A1 publication Critical patent/EP1397754A1/de
Publication of EP1397754A4 publication Critical patent/EP1397754A4/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/45Example-based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Definitions

  • This invention relates to a method and apparatus for creating a cross-idea association database for converting, manipulating, and/or translating information from one state to a second state.
  • the two states represent word languages (e.g., English, Hebrew, Chinese, etc.) such that the present invention creates a cross-language database correlating words and phrases in one language to their translation counterparts in a second language.
  • the present invention creates a database by examining documents in the two languages and creating a database of translations for each word or phrase in both languages.
  • the present mvention need not be limited to language translation.
  • the present invention allows a user to create a database of ideas, and associate those ideas to other, differing ideas in a hierarchical manner. Thus, ideas are associated with other ideas and rated according to the frequency ofthe occurrence. The specific weight given to the occurrence frequency, and the use applied to the database thus created, can vary depending upon the user's requirements.
  • the present invention will operate to create foreign language translations of words and strings of words in the English language.
  • the present invention will return a ranking of associations to those words (or strings of words); e.g., the word occurring the most often will be the foreign language equivalent ofthe word (in English), given an large enough sample size.
  • the present invention will also return other foreign language associations with the English word, and the user may manipulate those associations as desired.
  • the word "mountain,” when operated on according to the present invention may return a list of foreign language words in the language being examined.
  • the present invention is an automated association database creator.
  • the strongest associations represent “translations” in one sense, but other frequent (but weaker) associations represent ideas that are closely related to the idea being examined.
  • the purpose ofthe present invention is to develop a database of associations of words and phrases (strings of words) between one language and a second language. In general, the method involves examining and operating on two documents, each containing text which represents the same concept or content, but in two different languages.
  • the method and apparatus ofthe present invention is utilized such that a database is created with associations across the two languages - translations, or more specifically, possible associations for words and phrases.
  • the translation and other relevant associations for words and phrases between the two languages becomes stronger, i.e. more frequent, as more documents are examined and operated on by the present invention, such that by operation on a large enough "sample" of documents the most common (and, in one sense, the correct) associations becomes apparent and the method and apparatus can be utilized for translation purposes.
  • the preferred embodiment ofthe present invention utilizes a computing device such as a personal computer system ofthe type readily available in the prior art. However, the method and apparatus ofthe present invention does not need to use such a computing device and can readily be accomplished by other means, including manual creation ofthe cross-associations.
  • the method by which successive documents are examined to enlarge the "sample" of documents and create the cross-association database is varied - the documents can be set up for analysis and manipulation manually, by automatic feeding (such as automatic paper loaders as known in the prior art), or by using search techniques on the Internet to automatically seek out the related documents.
  • the present invention may be utilized on a common computer system having at least a display means, an input method, and output method, and a processor.
  • the display means can be any of those readily available in the prior art, such as cathode ray terminals, liquid crystal displays, flat panel displays, and the like.
  • the processor means also can be any of those readily available and used in a computing environment such that the means is supplied to allow the computer to operate to perform the present invention.
  • an input method is utilized to allow the input ofthe documents for the purposes of building the cross-association database; as described above the specific input method can vary depending on the needs ofthe user.
  • the documents are examined for the purpose of building the database.
  • the creation process begins using the methods and/or apparatus described herein.
  • Document A is in language A
  • Document B is in language B.
  • the documents have the following text:
  • the first step in the present invention is calculate a word range to determine the approximate associations for any given word or phrase. Since a word-for-word translation is not appropriate (i.e., word 1 in document A most likely will not exist as the literal translation of word 1 in document B), the database creation technique ofthe present invention tests each word in the first language against a range of words in the second language. This range thus is developed by examining the two documents, and is used to compare the words, phrases, or other word strings in the SECOND document against the words, phrases, or other word strings in the FIRST document. That is, a range of words (or phrases, or word strings) in the second document is applied as a possible match against any one word (or phrase, or word string) in the first document. By testing against a range, the database creation technique establishes a number of second language words that may equate and translate to the first language words.
  • the value ofthe range is, ultimately, user defined.
  • Various techniques can be used to determine the value ofthe range, including common statistical techniques such as the derivation of a bell curve based on the number of words in a document. With a statistical technique such as a bell curve, the range at the beginning and end ofthe document will be smaller than the range in the middle ofthe document.
  • a bell-shaped frequency for the range words allows reasonable extrapolation of possible word translations, whether it is derived according to the number of words in a document or according to the percentage of coverage of number of words desired.
  • the value ofthe range may depend on the number of words in the two documents. If the word count ofthe two documents are equal, any value may be given. Applying statistical techniques, a bell curve may be created such that the range is a lower number of words at the beginning ofthe document, the highest number of words in the middle of the document, and a lower number of words at the end ofthe document " .
  • a ratio may be used to correctly position the range. For example, if document A has 75 words and document B has 100 words, the ratio between the two documents is 3:4.
  • the mid-point of document A is word position 37 (or 38); however, using this mid-point (word position 37 or 38) as the placement for the largest value ofthe range (if determined according to a bell curve technique) in document B is not effective, since this position (word position 37 or 38) is not the midpoint of document B.
  • the point of maximum application of the range value in document B may be determined by the ratio or words between the two documents, by manual placement in the mid-point of document B, or by other techniques.
  • association frequencies for each possible translation.
  • the database creation technique ofthe present invention returns a possible set of words in the second-language document that translate to the word in the first document.
  • the possible set of words will be narrowed and an association frequency will be developed that will assist in the determination ofthe potential translation.
  • the present invention will create association frequencies for a word (or phrase, or word string) in one language to that same word (or phrase, or word string) in a second language.
  • the cross-language association database creation technique will return higher and higher association frequencies for any one word, phrase or word string.
  • the highest association frequency after a large enough sample is reviewed results in a translation; of course, the ultimate point where the association frequency is deemed to be an accurate translation is user defined and subject to other interpretive translation techniques (such as those described in Provisional Application No. 60/276,107, entitled “Method and Apparatus for Content Manipulation” filed on March 16, 2001 and incorporated herein by reference.
  • association frequencies could result for the Spanish equivalent to the English word “friend”: “gato” - 25%; “burro” - 15 %; and “amigo” - 60%.
  • the present invention operation will increase the association frequency for "amigo” and decrease the association frequencies for "gato” and "burro.”
  • the association frequency will reach a level such that the a translation is deemed to have occurred such that the word "friend” in English translates to "amigo" in Spanish.
  • the invention tests not only words but phrases, or strings of words (multiple words).
  • the database creation technique ofthe present invention analyzes a two-word word string, then three- word word string, and so on in an incremental manner. This technique makes possible the translation of phrases or word strings in one language into one word in another, as often occurs.
  • the analysis stops when all positions for the word or word string have been analyzed, if the number of words (or word strings) is greater than one. If a word only occurs once in a document, the process immediately proceeds to increment a word and return a word string. When a word string only occurs once, the process cycles back to the second word in the document, where the analysis cycle occurs again as described above.
  • the incrementing, testing and return process occurs in a similar manner for word strings.
  • the number of occurrences for any phrase is examined, phrases are returned based on the range, and a database is created of possible translations for that phrase.
  • the present invention can operate in such a manner so as to analyze word strings that depend on the correct positioning or words (in that word string), and can operate in such a manner so as to account for grammatical idiosyncrasies such as phrasing, style, or abbreviations.
  • the present invention can accommodate different variations that occur in documents where subsets of words occur within larger word strings. For example, proper names are sometimes presented complete (as in "John Doe"), abbreviated by first or surname ("John” or “Doe”), or abbreviated by another manner ("Mr. Doe”).
  • the present invention accounts for these patterns by recognizing, through the analysis, the existence of these patterns in the association database, and manipulating the frequency return. Since the present invention will most likely return more individual word returns than word string returns (i.e., more returns for the first or surnames rather than the full name word string "John Doe"), because the words that make up a word string will necessarily be counted individually as well as part ofthe phrase, a change in ranking maybe utilized.
  • Step 1 a range is determined.
  • the range may be user defined or may be approximated by a variety of methods.
  • the word count ofthe two documents is approximately equal (ten words in document A, eight words in document B); a range value of three (thirty percent ofthe words in document A) may provide the best results.
  • a range value of three may provide the best results.
  • the range will be one at the beginning and end ofthe document, and two in the middle.
  • the range (or the method used to determine the range) may be entirely user defined.
  • the range is will vary from one word, to two words, to one word as the database creation technique ofthe present invention is utilized.
  • Step 2 the first word in document A is examined and tested against document A to determine the number of occurrences of that word in the document.
  • the first word in document A is X: X occurs three times in document A, at positions 1, 4, and 9.
  • the position numbers of a word, phrase, or other word string are simply a notation ofthe number of times that word, phrase, or word string is present in the document, and the location of that word, phrase, or word string in the document relative to other words.
  • the position numbers correspond to the number of words in a document, ignoring punctuation - for example, if a document has ten words in it, and the word "king" appears twice, the position numbers ofthe word "king” are merely the places (out often words) where the word appears. Because word X occurs more than once in the document, the process proceeds to the next step. If word X only occurred once, then that word would be skipped and the process expanded to the next word string (or phrase) and the creation process continued.
  • Step 3 Possible second language translations for first language word X at position 1 are returned: applying that range to document B yields words at positions 1 and 2 (1 +/- 1) in document B: AA and BB (located at positions 1 and 2 in document B). All possible combinations of this word are returned as a potential translation for X: AA, BB, and AA BB (as a word string combination). The word string combination is returned as a possible match to accommodate the fact that a word in one language may equate to a phrase in the second language.
  • XI the first occurrence of word X
  • Step 4 The next position of word X is analyzed. This word (X2) occurs at position 4. Since position 4 is near the center ofthe document, the range (as determined above) will be two words. Possible translations are returned by looking at word 4 in document B and applying the range (2) - hence, two words before word 4 and two words after word 4 are returned. Thus, words at positions 4 +/- 2 are returned, or at positions 2, 3, 4, 5, and 6. These positions correspond to words BB, CC, AA, EE, and FF in document B.
  • X2 returns BB, CC, AA, EE, FF, BB CC, BB CC AA, BB CC AA EE, BB CC AA EE FF, CC AA, CC AA EE, CC AA EE FF, AA EE, AA EE FF, and EE FF as associations.
  • Step 5 The returns ofthe first occurrence of X (position 1) is compared to the returns ofthe first occurrence of X (position 4) and matches are determined, h this case the associations for XI and X2 are compared, and the matches in the two documents provided. Note that identical returns (or word occurrences or word strings) in the overlap between the two ranges can be reduced to a single occurrence.
  • the word at position 2 is BB; this is returned both for the first occurrence of X (when operated on by the range) and the second occurrence of X (when operated on by the range). Because this same word position is returned for both XI and X2, the word is counted as one occurrence. If, however, the same word is returned but not in an overlap area (i.e., the same word position is not returned for both XI and X2, but the results happen to return the identical word), then the word is counted twice. In this case the returns for word X is AA, since that word (AA) occurs in both association returns for XI and X2. Note that the other word that occurs in both associations returns is BB; however, as described above, since that word is the same position (and hence the same word) reached by the operation ofthe range on the first and second occurrences of X, the word can be disregarded.
  • Step 6 The next position of word X (position 9) (X3) is analyzed.
  • Step 8 Because no more occurrences of word X occur, the process is incremented by a word and a word string (or phrase) is tested. In this case the word string examined is "X Y", the first two words in document A. The same technique described in steps 2-7 are applied to this phrase.
  • Step 9 By looking at document A, we see that there is only one occurrence ofthe word string X Y. At this point the incrementing process stops and no database creation occurs. Because an end-point has been reached, the next word is examined (this process occurs whenever no matches occur for a word string); in this case the word in position 2 of document A is "Y".
  • Step 10 Applying the process of steps 2-7 for the word "Y" yields the following:
  • Step 11 End of range incrementation: Because the only possible match for word Y (word CC) occurs at the end ofthe range for the first occurrence of Y (CC occurred at position 3 in document B), the range is incremented by 1 at the first occurrence to return positions 1, 2, 3, and 4: AA, BB, CC, and AA; or the following forward permutations: AA, BB, CC, AA BB, AA BB CC, AA BB CC, BB CC, BB CC AA, and CC AA. Applying this result still yields CC as a possible translation for Y. Note that the range was incremented because the returned match was at the end ofthe range for the first occurrence (the base occurrence for word "Y"); whenever this pattern occurs an end of range incrementation will occur as a sub-step (or alternative step) to ensure completeness.
  • Step 12 Since no more occurrences of "Y" exist in document A, the analysis increments one word in document A and the word string "Y Z" is examined (the next word after word Y). Incrementing to the next string (Y Z) and repeating the process yields the following:
  • Word string Y Z occurs twice in document A: position 2 and 7;
  • Possibilities for Y Z at the first occurrence are AA, BB, CC, AA BB, AA BB CC, BB CC;
  • Possibilities for Y Z at the second occurrence are EE, FF, GG, CC, EE FF, EE FF GG, EE FF GG CC, FF GG, FF GG CC, and GG CC;
  • Extending the range yields the following for Y Z: AA, BB, CC, AA BB, AA BB CC, AA BB CC AA, BB CC, BB CC AA, and CC AA.
  • Step 13 Since no more occurrences of "Y Z" exist in document A, the analysis increments one word in document A and the word string "Y Z X" is examined (the next word after word Z at position 3 in document A). Incrementing to the next phrase (Y Z X) and repeating the process (Y Z X occurs twice in document A) yields the following:
  • Permutations are EE, FF, GG, CC, EE FF, EE FF GG, EE FF GG CC, FF GG, FF GG CC, and GG CC.
  • Step 14 Incrementing to the next word string (Y Z X A) finds only one occurrence; therefore the word string database creation is completed and the next word is examined: Z (position 3 in document A).
  • Step 15 Applying the steps described above for Z, which occurs 3 times in document A, yields the following:
  • Zl are: AA, BB, CC, AA, EE, AA BB, AA BB CC, AA BB CC AA, AA BB CC AA EE, BB CC, BB CC AA, BB CC AA EE, CC AA, CC AA EE, and AA EE;
  • Step 16 Incrementing to the next word string yields the word string Z X, which occurs twice in document A. Applying the steps described above for Z X yields the following: • Returns for Z XI are: BB, CC, AA, EE, FF, BB CC, BB CC AA, BB CC AA EE, BB CC AA EE FF, CC AA, CC AA EE, CC AA EE, CC AA EE, CC AA EE FF, AA EE, AA EE FF, and EE FF.
  • Step 17 Incrementing, the next phrase is Z X A, which only occurs, so the next word (X) in document A is examined.
  • Step 18 Word X has already been examined in the first position. However, the second position of word X, relative to the other document, has not been examined for possible returns for word X. Thus word X (in the second position) is now operated on as in the first occurrence of word X, going forward in the document:
  • Step 19 Incrementing to the next word string (since no more occurrences of
  • Step 20 Applying the process described above for the second occurrence of word Z yields the following:
  • Step 21 Incrementing by one word yields the word string Z X; this word string does not occur in any more (forward) positions in document A, so the process begins anew at the next word in document A - "X". Word X does not occur in any more (forward) positions of document A, so the process begins anew. However, the end of document A has been reached and the analysis stops.
  • Step 22 The final association frequency is tabulated combining all the results from above. There is insufficient data to return results for other words and phrases in document A. Note that many possible associations occur for word CC in document B, as either an individual word or a word string in document A. As more document pairs are examined containing word CC in language B, the association frequencies will become statistically more reliable such that a word (or, possibly a word string) will exist as the translation for word CC. hi another embodiment, the database creation technique ofthe present invention may be utilized in a variety of ways to create the cross-language associations.
  • the database may be created by simply matching every word and word string (or phrase) occurring in document A with a range of words in document B (using the range techniques described above), without comparison to multiple occurrences ofthe word, and without range incrementing techniques.
  • This method utilizes the principle of cross-language association to create the database in a different manner than that described above.
  • the word count in each document is established to create an appropriate ratio.
  • the ratio is used for comparative range positioning, as described below, hi this example, document A has twenty words, while document B has fifteen words, for a ratio of 4:3. Thus, every four words of document A equate to three words of document B.
  • a segment of words is established for the word strings, or phrases, to be examined.
  • This segment can be determined according to common language rules; e.g., a segment can be a sentence or paragraph.
  • the length ofthe segment is user defined and can be any fragment of word strings desired.
  • the segments will correspond to the sentences in each respective document, although larger segments are usually more effective than single sentences to create the associations ofthe present invention because there exists a larger base of potential associations to fill the database.
  • Positions of words are determined by their respective word count location in any document. Using the example, the positions ofthe word "the” are one, five, nine, and fifteen (the first, fifth, ninth, and fifteenth words in the document).
  • the target words are determined by using the word ratio to determine the relative point in document B, and applying the range to that word position in document B (the range is user defined as described in the first embodiment).
  • the relative position of a word in document B is determined by applying the ratio calculated above.
  • the word "the” occurs in the first, fifth, ninth, and fifteenth positions of document A. These positions correspond to relative positions 1, 4, 7, and 11 of document B.
  • Frequency range applied to preceding and following words in document B equals word positions 1-3 in Document B. 1 This determination occurs by taking the position +/- the frequency range, or 1 +/- 2, or -1 through 3. Ignoring the negative and null positions returns a word position result of 1-3 in Document B.
  • Matches are AAA (twice), BB and CCC.
  • the present invention increments the number of words examined by one. h the first example the word examined was "the” (the first word in Document A). Incrementing, the next word string to be analyzed are the words "the sky”.
  • the process is then incremented a word and the process repeated for "the sky is.” This process yields as a potential match only the first occurrence AAA BB CCC, as there are no other occurrences.
  • the end ofthe first segment has been reached, defined by the user as that indicated by punctuation in Document A.
  • the next step is to take the SECOND word in the first segment and continue the iterative process described above - in the example the analysis would include “sky,” “sky is,” and “sky is blue” yielding the following as matches: “sky” occurs in positions 2 and 10 in Document A; which yields 2 and 7 as relative positions in Document B; which yields AAA BB CCC AAA as the first match and EEE DDDD AAA BB and FFF as the second match; which yields AAA and BB as possible associations to be stored in the database.
  • segment one The next incremented word in segment one is “blue” which yields AAA BB CCC AAA and EEE as possible associations to be stored in the database.
  • the analysis is now up to the end of segment one.
  • the next segment is the sentence "the grass is green.” Since "the” has already been analyzed the next word portion to be analyzed is “the grass,” followed by "the grass is”, “the grass is green”, “grass”, “grass is”, “grass is green”, and "green.”
  • the sky includes clouds and stars
  • segment sentences (“Went to school today. She walked to the school on the street.") can be analyzed by extending the segment to incorporate the person ("she") into the first sentence when the present invention acts to translate languages.
  • these two embodiments are representative ofthe technique used to create associations.
  • the techniques ofthe present invention need not be limited to language translation; in a broad sense, the techniques will apply to any two embodiments ofthe same idea that may be associated, for at its essence foreign language translation merely exists as a paired association with one idea (the word or phrase).
  • the present invention may be applied to associating data, sound, music, video, or any wide ranging concept that exists as an idea, including ideas that can represent any sensory (sound, sight, smell, etc.) experiences. All that is required is that the present invention analyze two embodiments (in language translation, the embodiments are documents; for music, the embodiments might be digital representations of a music score and sound frequencies denoting the same composition, and the like).
  • an embodiment ofthe present invention loads, by either mechanical, electrical, or other means, certain associations in to the database.
  • certain associations for example, it is possible to load the database with foreign language equivalents ofthe English words it, his, her, an, a, of- or any common words - to create the association database more accurately, more efficiently, and with a faster resolution.
  • the present invention would automatically return the foreign language equivalents of certain words loaded into the database.
  • This embodiment allows the association database creation technique ofthe present invention to accommodate common words that may skew the analysis
  • an embodiment can utilize common associations to create and recognize word patterns. For example, it is possible to load associations into the database (e.g., "President" for "Clinton") such that the association database accommodates situations where the text means President Clinton, but only the word "president" is utilized as an abbreviation.
  • the cross-language association exists in its broad sense as a cross-idea association technique for creating a database of possible associations, the results may be manipulated when an association is established.
  • each "idea” is assigned an association to an electromagnetic wave (tone), it will be possible to create an "electromagnetic association” ofthe idea.
  • data in the form of an idea
  • data can be manipulated into electromagnetic waves and transferred at once over conventional telecommunications infrastructure.
  • that machine will synthesize the waves into separate components and, given the associations, present the individual ideas that were represented by the electromagnetic associations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
EP02744486A 2001-06-21 2002-06-21 Querideenassoziationsdatenbankerzeugung Withdrawn EP1397754A4 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US29947201P 2001-06-21 2001-06-21
US299472P 2001-06-21
PCT/US2002/019587 WO2003001403A1 (en) 2001-06-21 2002-06-21 Cross-idea association database creation

Publications (2)

Publication Number Publication Date
EP1397754A1 true EP1397754A1 (de) 2004-03-17
EP1397754A4 EP1397754A4 (de) 2006-05-10

Family

ID=23154946

Family Applications (1)

Application Number Title Priority Date Filing Date
EP02744486A Withdrawn EP1397754A4 (de) 2001-06-21 2002-06-21 Querideenassoziationsdatenbankerzeugung

Country Status (9)

Country Link
EP (1) EP1397754A4 (de)
JP (1) JP2004531832A (de)
KR (1) KR20040007741A (de)
CN (1) CN1520558A (de)
CA (1) CA2447229A1 (de)
EA (1) EA006182B1 (de)
IL (1) IL158749A0 (de)
WO (1) WO2003001403A1 (de)
ZA (1) ZA200309843B (de)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1786954B (zh) * 2005-12-20 2010-05-05 无敌科技(西安)有限公司 多语多本综合查询方法及其系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5579224A (en) * 1993-09-20 1996-11-26 Kabushiki Kaisha Toshiba Dictionary creation supporting system
US6236958B1 (en) * 1997-06-27 2001-05-22 International Business Machines Corporation Method and system for extracting pairs of multilingual terminology from an aligned multilingual text

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0728819A (ja) * 1993-07-07 1995-01-31 Kokusai Denshin Denwa Co Ltd <Kdd> 対訳辞書自動作成方式
JPH09128396A (ja) * 1995-11-06 1997-05-16 Hitachi Ltd 対訳辞書作成方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5579224A (en) * 1993-09-20 1996-11-26 Kabushiki Kaisha Toshiba Dictionary creation supporting system
US6236958B1 (en) * 1997-06-27 2001-05-22 International Business Machines Corporation Method and system for extracting pairs of multilingual terminology from an aligned multilingual text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO03001403A1 *

Also Published As

Publication number Publication date
KR20040007741A (ko) 2004-01-24
JP2004531832A (ja) 2004-10-14
WO2003001403A1 (en) 2003-01-03
EA200400059A1 (ru) 2004-04-29
CA2447229A1 (en) 2003-01-03
CN1520558A (zh) 2004-08-11
EP1397754A4 (de) 2006-05-10
ZA200309843B (en) 2005-01-19
IL158749A0 (en) 2004-05-12
EA006182B1 (ru) 2005-10-27

Similar Documents

Publication Publication Date Title
US5794177A (en) Method and apparatus for morphological analysis and generation of natural language text
US20090094017A1 (en) Multilingual Translation Database System and An Establishing Method Therefor
Huang et al. Rethinking chinese word segmentation: tokenization, character classification, or wordbreak identification
Wróbel et al. Transformer-based part-of-speech tagging and lemmatization for Latin
Du et al. Using babelnet to improve OOV coverage in SMT
Weerasinghe A statistical machine translation approach to sinhala-tamil language translation
CN104239293B (zh) 一种基于机器翻译的专名翻译方法
Huang et al. Chinese-Korean word alignment based on linguistic comparison
Norbu et al. Dzongkha word segmentation
JPS60189565A (ja) 仮名漢字変換装置
WO2003001403A1 (en) Cross-idea association database creation
AU2002345728A1 (en) Cross-idea association database creation
CN105095322A (zh) 人名单元词典扩充方法、人名语言识别方法和装置
Kasthuri et al. An improved rule based iterative affix stripping stemmer for Tamil language using K-mean clustering
Swaroop et al. Parts of speech tagging for Kannada
Fan et al. Automatic extraction of bilingual terms from a chinese-japanese parallel corpus
Tambouratzis Conditional Random Fields versus template-matching in MT phrasing tasks involving sparse training data
JP2009230561A (ja) 例文集合ベース翻訳装置、方法およびプログラム、並びに翻訳装置を含んで構成された句翻訳装置
JPS63228326A (ja) キ−ワ−ド自動抽出方式
Weerasinghe Bootstrapping the lexicon building process for machine translation between ‘new’languages
Enemouh et al. Morph-inflected word detection in igbo via bitext
Mekki et al. Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs
Jin et al. Automatic Extraction of English-Chinese Transliteration Pairs using Dynamic Window and Tokenizer
Giri et al. English Kashmiri Machine Translation System related to Tourism Domain
Gordillo et al. Neural Machine Translation tool from Spanish to English in the Medical Domain

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20031121

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

A4 Supplementary search report drawn up and despatched

Effective date: 20060323

17Q First examination report despatched

Effective date: 20061106

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20070103