WO2008086889A1

WO2008086889A1 - Transcription device for automatic transcription and transphrasing and corresponding methods

Info

Publication number: WO2008086889A1
Application number: PCT/EP2007/050418
Authority: WO
Inventors: Emil Müller; Francois RÜF
Original assignee: Netbreeze Gmbh
Priority date: 2007-01-16
Filing date: 2007-01-16
Publication date: 2008-07-24

Abstract

The invention relates to a transcription device and corresponding methods for the computer-aided transcription and/or transphrasing of non bijectively associated elements of a first (20) and second (50) group by means of an automated transcription device (10), wherein by means of a filter module (113) based on a coding of a first transcription (40), a plurality of transcription variations are generated by variation with indexed filler elements. Each transcription variation is associated with an increment stack (116). For each transcription variation, a corresponding search element is generated. By means of the transcription device (10), databases (71, &mldr;, 74) that are arranged in a decentralized manner are accessed via a network (70), wherein the corresponding increment stack (117) is incremented accordingly by means of a trigger module (111) with each triggering of a search element (1211,&mldr;,1212). Based on the cumulative increment stacks (117), probability parameters are generated, and, by means of a comparison module (114), a certain transcription is clearly selected based on the probability parameter.

Description

Transcription device for automated transcription and transphrasing and corresponding method

The invention relates to a transcription device and a corresponding method for the computer-aided transcription and / or transphrasing of non-bijectively assignable elements of a first and second group. In particular, the invention relates to transcription devices for transcription and / or transphrasing in automated search engines and conversion devices, wherein first search terms and / or first search sentences can be linked to second search terms and / or search sentences by means of a transcription device.

Transcription from Latin "trans" over and "scribere" writing, ie the transcription into one term into another or more generally the assignment of one element of a group to another is a problem long known in the art, in the most diverse areas and manifestations shows up. For example, in biology that

Rewriting of a gene from DNA to RNA, in linguistics the transmission of a spelling or a phoneme in a different than the original or the respective language corresponding writing (eg, from the Cyrillic alphabet in the Latin alphabet), in musicology in addition to the transcription of one Notation in the other (eg at

Change of key) but also the transfer of a sounding work into a notation, in the qualitative social research the transfer of an interview into an evaluable form, in linguistics and in particular the conversation analysis the transfer of spoken language, conversations or even gestures in a written fixed form, in of the

Edition sciences the letter-exact transcription of a text, in film analysis the transfer of a film into a written form, or in business the usual name for the typification of the spoken word by a transcriptionist, the company-internal typing service or an external writing office, etc. etc.

All these problems are ultimately based on a mapping or imaging problem. Can the elements be assigned bijectively that is, the assignment is reversibly unique, then it is often a mere coding problem. Bijectivity exists when each element can actually be "transcribed" into another element by transcription, and only into a single other element. The two element groups which can be linked by transcription are thus uniquely connectable by transcription. However, bijective transcriptions are often not possible with many technical problems. This is the case if, for example, one element can be assigned to several others by transcription, eg in the transcription of terms (eg person, company, place name etc.) from one alphabet to another alphabet (eg Latin letters in Cyrillic) , In particular, this type of transcription may attempt to make the phonetic directories, which apply in one language (alphabet), accessible to readers of another language (alphabet). For this reason, for example, non-spoken mutes can not be coded as they are often given by the language and may appear more or less arbitrary. Also, the assignment is usually not easy if the assignment is bijectively possible, but there are too many elements of a group and general rules can not be established. This can be done, for example, in the transcription of continuous groups or analogous groups (groups with a very large number of elements) in groups with discrete elements.

Font-based transcription may e.g. the representation of certain terms from a font using a phonetic transcription or adapted to the pronunciation rules of a target language. Each transcription system is geared to users who speak a target language. The German

Transcription, as used for example in the Duden, can serve as a guideline for the reproduction of Cyrillic written names. The same can apply, for example, to Greek names or phrases. A distinction is usually made in the prior art: a) Transcription as a pronunciation-based representation of speech by means of a phonological notation or a phonetic phonetic transcription, or another basic alphabet as a phonetic substitution. Advantages are that, for example, non-native speakers are allowed a reasonably correct pronunciation of the word; b) Transliteration as a font-based, literal translation that can be reversed if necessary a word from one scripture (eg Cyrillic) to another (eg Latin), often with the help of diacritical marks. One of the advantages of this is that professionals can represent the exact spelling of the word in the other font, which for some reason can not be printed directly - because there were no previous types or fonts or because (for example in library catalogs) a single alphabet for Sorting is necessary; c) Transcription in science (eg sociology, education, economics) also means the verbal verbal data (mostly interviews or videos). These are needed in qualitative social research for qualitative data analysis.

As an example of transcription, the comparison of various transcriptions from Cyrillic (using the example of the names of two Russian writers) into Latin bush letters can be taken:

Tables of transcription and transliteration systems exist for many languages such as Bulgarian, Macedonian, Russian, Serbian, Ukrainian, Belorussian. In Japanese, the transcription of the Japanese into the Latin script P - ^ ψ {Rδmaji Roman characters). There are several transcription systems. Two well-known and well-recognized are the Hebrews system (in German: Hepburn system) and the Kunreishiki system (in German: Kunrei system). The former was distributed by the American missionary Hepburn; The latter was devised by the then Japanese government and follows the systematics of the Cana table. Transcription, for example, of Japan's holy mountain, the a ± lll, (often referred to in German as "Fuji"), is written after the Kunrei system "Huzisanönach" and after the Hepburn system "Fujisan"

For the voice in sight and vowels, the following applies:

In Hebrew, there is the special difficulty of holding it for one language or more (Biblical, Tiberian Hebrew, Haskala Hebrew, Israeli). And for the Israeli there are several discussions. In Hebrew, the difference between a purely phonological and a morpho-phonological transcription can easily be shown. As an example Kibύts - QibbύD can be taken here. The first Writing is purely Israeli and reflects the modern standard pronunciation. The second notes the classical spelling with q because p and not s (that today both sounds are spoken the same by most Israelis, is irrelevant, because those who pronounce them the same, they always speak the same, but always write them correctly). The "bb" arises because a Dagsch is in the bet and D instead of ts. This preserves the kinship with the Arabic D and at the same time a Hebrew character corresponds to a transcription or transliteration symbol. Mixed forms like Kibbutz and Qibutz are less convincing. For example, the acute transcription indicates the stressed syllable both times. In other

Transcriptions reproduce the nuances of vowel sounds that are neither written nor spoken in Israel, or indicate whether a vowel is unwritten, written by vowel, or (additionally) noted by a consonant. As another example tapuach - tapύaπ, michtav - miotav can be taken. The first transcription makes no difference between n and D because most Israelis do not speak. Newscasters have to do it (it's official), just as Israelis who speak n like D often say it's "wrong": the more accurate transcription is clear even with the wrong pronunciation. The same applies e.g. for bayäd ba-yäd, kDshetire kD-se-tire. In the first case, we write down what is written in Hebrew. Also in the second case, letter clusters are respected (spaces and punctuation marks are rendered as such), but in addition words or functional particles are separated and linked by hyphen.

As shown, the rules for transcription from one element to another are usually not unique, but can only be found in the context of language usage. This has made automation of transcription difficult or impossible in most cases. Encoding was difficult to create because languages can typically be very large. At the same time, the codings (one to one assignment of the elements in a lookup table) had to be kept up-to-date permanently and at great expense. The available search engines of the state of the art can roughly be divided into four categories: robots / crawlers, metacrawlers, search catalogs with search options and catalogs or link collections. The functionality of robots / crawlers, ie search robots or crawlers, is characterized by a process (ie the crawler) that moves through the network, eg the Internet, from network node to network node or from web site Web site, sending the content of every Web document it finds back to its host. The host computer indexes the web documents sent by the crawler and stores the information in a database. Every search

(Request) by a user accesses the information of the database. The prior art crawlers usually consider every piece of information to be relevant, so any web documents found anywhere are indexed by the host machine. Examples of such robots / crawlers include i.a. Google ™, Altavista ™ and Hotbot ™. The so-called metacrawlers differ from the robots / crawlers in being able to search using a single search facility, the answer being additionally generated by a variety of other systems of the network. The Metacrawler thus serves as a front-end to a variety of other systems. The response to a search request from a Metacrawler is typically limited by the number of its other systems. Examples of Metacrawlers include u.a. MetaCrawler ™, LawCrawler ™ and LawRunner ™.

Another option is catalogs with or without search options. They are characterized by a special selection of links, which are structured and / or organized by hand and stored in a corresponding database. In the case of a catalog with search options, the manually stored information is searched by the system for the desired search term in a search request. In the case of a catalog without search options, the user must search for the desired information himself from the list of stored links, for example by manually clicking through the list or scrolling. In the latter case, the user himself decides which information from the list is relevant to him and which is less relevant to him. Catalogs are naturally limited by the volume of performance and the priorities of the editor (s). Examples of such catalogs include Yahoo! ™ and FindLaw ™. Catalogs fall under the category of portals and / or vortals. Portals and to a certain extent eg proprietary databases like FindLaw.com ™ or WestLaw.com ™ try to solve the problem in different ways. Portals manually attempt to gain an overview of selected computer sites by "surfing" editors through the Internet, ie having the content judged, and compiling relevant data sources or sites. The editors are able to search, read and evaluate an average of about 10-25 sites per day, of which 25 usually only just 1 or 2 sites contain documents with the desired quality or information. It is clear that portals are very inefficient in terms of time, cost and effort for the provider if the goal of a portal is to provide a comprehensive indexing of all available data on a topic on the Internet. For this reason, it is usually the case that Internet portals also only provide links to the start / main pages of the various sites. Since the availability of data on the Internet is subject to strong dynamics, it may even be said that with this procedure a complete and up-to-date collection of all available data will hardly ever be possible. Vertical portals, so-called vortals, are generally portals that restrict their offer / selection of information to a specific area. Therefore, vortals have intrinsically the same disadvantages as the portals discussed above. On the contrary, the above-mentioned disadvantages in vortals come even more into the foreground, because their claim to the quality and accuracy of indexing is set much higher by their subject limitation. This makes the task of searching, reading and assessing a critical amount of information even more difficult and even more time consuming. An example of such a predecessor is FindLaw.com ™, which has been offered and developed since 1995.

One of the main problems of many capture systems, especially web engines, is the language problem and the problem of transcription. New appearing names and terms can hardly ever be captured by a system in their transcription in all languages and spellings. With the web engines, therefore, many relevant data and information are not found. International Patent Application WO 03/065248 A2 shows a system which solves the language and transcription problem by means of Tried to solve multi-language index. Documents can be searched for languages in parallel or evaluated accordingly. Finally, US Patent Application US2005 / 0102270A1 discloses a system which, in addition to indexing, attempts to organize the documents into a plurality of found documents by means of tabulation based on hierarchical index parameters (index, subindex, etc.) for the user gets a thematically structured access to the documents. However, the purely tabular breakdown of the documents can not give the user any information about how the subject areas are linked to each other and how they relate to each other in terms of their relevance. With a large amount of found documents, the user is just as lost as in the conventional relevance listing. In other words, both applications are based on coding, of whatever nature, and can hardly ever be automated based on this approach.

It is an object of this invention to provide a new one

Transcription device and a corresponding method for computer-aided transcription and / or transphrasing non-bijectively assignable elements of a first and second group, which does not have the above-mentioned disadvantages of the prior art. In particular, the invention is intended to make it possible to realize a transcription device which, without any further action, adapts itself dynamically to a new word usage, in particular newly appearing names, and automatically proposes the correct transcription. Likewise, the transcription device should do without elaborate coding of words, but be producible with minimal effort.

According to the present invention, this object is achieved in particular by the elements of the independent claims. Further advantageous embodiments also emerge from the dependent claims and the description.

In particular, these objects are achieved by the invention in that for computer-aided transcription and / or transphrasing non-bijectively assignable elements of a first and second groups means automated transcription device with a Monte Carlo module different combinations of indexed filling elements are generated and stored in a database, based on the associated index parameters, so that by means of definable transcription parameters a first transcription is generated, the respective used

Transcriptional parameters are encoded according to their transcription site such that by means of a filter module based on the encoding of the first transcription and the corresponding transcription sites, a plurality of transcription variations are generated by variation with the combinations of indexed fill elements, each

Transskritpionsvariation is associated with an incremental stack, that for each transcription variation generates a corresponding search element and accessed by transcription device via a network on decentralized databases, the corresponding incremental stack is incremented by trigger module each time triggering a search element that generated based on the accumulated incremental stack probability parameter and by means of comparison module based on the probability parameters, a specific transcription is uniquely selected. In particular, the filling elements may be e.g. include phonetically non-relevant phonograms in the target language. Likewise, the filling elements may be e.g. include meaningful, affirmative or attenuating filler words. The invention has i.a. the advantage that transcription devices can be fully automated for the first time, even for transcription problems that can not be fully captured by definable transcription methods. The network may e.g. include the international backbone IP network. Furthermore, it has the advantage that transcriptions which can be processed only with great effort and time, e.g. by means of lookup table, i. a one-to-one encoding of the elements to be assigned can be realized are directly detectable. New names and terms are also detected and used dynamically correctly by the transcription device according to the invention. This was not possible with any prior art.

In an embodiment variant, the automated transcription device comprises a control and monitoring module for controlling Web engines and / or conversion devices, wherein inteis the transcription device additionally source databases are accessible. This embodiment variant has the advantage, inter alia, that these systems can automatically access a previously definable entirety of source databases from a network, in particular from the Internet (eg web sites, chat rooms, e-mail forums, etc.), which also have a previously definable Search criteria are scanned, regardless of language, font and spelling. Thus, the system not only enables the generation of a "hit list" of web sites with corresponding content found on the Internet, but the system allows the aforementioned screening of predefinable sources and their systematic and thus quantitatively relevant evaluation, according to the desired and defined content criteria independently of speech, writing and writing criteria. By dynamically updating the transcription device, the system can actually "monitor" the defined sources for the first time in the art independently and over a longer period of time, even if the language and writing usage change, such as when introducing new spellings such as the Duden or new appearing name.

In another embodiment, the first group of the second group is assigned by means of the transcription device, wherein the assignment of the first group in the second group is not surjective, while by means of a coding module of the transcription device, the second group of the first group is assigned, the assignment of the second Group is surjective to the first group. This variant has, inter alia, the same advantages as the previous embodiments. In particular, the second group may be based, for example, on the Cyrillic alphabet. This has the advantage that transcriptions in languages such as Bulgarian, Macedonian, Russian, Serbian, Ukrainian, Belorussian can be easily grasped. Another advantage is that web engines based on the inventive transcription device Web Sides, especially New Groups, etc. can easily detect. In particular, the filling elements and / or transcription variations may include not only Cyrillic but also, for example, Hebrew letters. This has the advantage that transcription terms are captured in the appropriate languages such as old / new Hebrew.

In a further embodiment, the scorecard with the found records and / or references to the found records is stored in a content module of a central unit accessible to a user. This variant has u.a. the advantage that the system e.g. can be used as a monitoring, monitoring and / or warning system for the user.

In another embodiment variant, a user profile is created on the basis of user information, wherein user-specific optimized data is generated based on the data records stored in the content module, found and / or references to data records found by means of a repackaging module taking into account the data of the user profile, which user-specifically optimized data the user stored in the content module of the central unit provides. The user can be stored as a variant variant different user profiles for different communication devices of the user assigned. Further, e.g. Also, data on user behavior is automatically recorded by the central unit and stored in association with the user profile. This variant has u.a. the advantage that different access options of the user can be considered user-specific and the system can be optimized user-specific.

At this point, it should be noted that the present invention, in addition to the inventive method on a

Transcription device for carrying out this method relates. Furthermore, it is not limited to the said triggering device and a corresponding method, but also relates to a computer program product for implementing the method according to the invention. Hereinafter, embodiments of the present invention will be described by way of examples. The examples of the embodiments are illustrated by the following figures:

FIG. 1 schematically shows the mode of operation of a transcription device 10 according to the invention for computer-aided transcription and / or transphrasing of non-bijectively assignable elements of a first 20 and second 50 groups by means of the automated transcription device 10.

FIG. 2 likewise schematically illustrates the mode of operation of a transcription device 10 according to the invention for computer-assisted transcription and / or transphrasing of non-bijectively assignable elements of a first 20 and second 50 groups by means of the automated transcription device 10. The method is shown schematically in more detail.

FIG. 3 likewise illustrates a schematic representation of a

Embodiment of the transcription method by means of the transcription device 10.

Figure 1 schematically illustrates an architecture that may be used to implement the invention. In this embodiment, for computer-aided transcription and / or transphrasing of non-bijectively assignable elements of a first 20 and second 50 groups by automated transcription device 10 with a Monte Carlo module 112 of the transcription device 10, different combinations of indexed fill elements are generated and stored in a database 115 based on the stored index parameter stored. The filling elements may include, for example, phonetically irrelevant phonograms. However, the filling elements may also include, for example, meaningful, affirmative or attenuating filling words. For example, the Monte Carlo module 112 can probabilistically generate transcriptions (eg purely randomly or according to a probability distribution), which are then used for further processing / analysis. However, it is important to note that the onset of the Normally, fill elements follow predefined rules as described below. However, whether a rule for inserting a filling element in the generation of the different transcriptions is applied or not, is then probabilistic, for example, by means of the Monte Calro module. Likewise, it is important to point out that the

Transcription device or the corresponding method is based as a whole on the probability distribution of all possible generated transcriptions and triggers accordingly. In other words, the transcriptions themselves become i.N. concerning the filling elements is not probabilistically generated, since, as stated, the insertion of the filling elements can follow predefined rules, but only with respect to the application of a filling rule or the non-application.

By means of definable transcription parameters of a base module, a first transcription 40 is generated for a selected element of the first group 20, wherein the respective transcription parameters used are encoded according to their transcription site. By means of a filter module 113, based on the coding of the first transcription 40 and the corresponding transcription sites, a plurality of transcription variations are generated by variation with the combinations of indexed fill elements, each transskritization variation being associated with an increment stack 116. For each transcription variation, a corresponding search element is generated and by means of transcription device 10 is accessed via a network 70 on decentralized databases 71, ..., 74, wherein the corresponding incrementation stack 117 by means of trigger module 111 on each triggering of a search element 1211, ..., 1212 is incremented. Based on the accumulated incrementation stacks 117, probability parameters are generated and, using comparison module 114, a specific transcription is uniquely selected based on the probability parameters. The network 70 may include, for example, the international backbone IP network. However, the network 70 can also include, for example, communication networks, such as a GSM or UMTS network, or a satellite-based mobile radio network, and / or one or more fixed networks, for example the public switched telephone network, the worldwide Internet or a suitable LAN (Local Area Network) or WAN (Wide Area Network). In particular, it also includes ISDN and XDSL connections. A transcription device 10 thus accesses network nodes connected to source databases 71, ..., 74 via the network 70, and data of the source databases 71, ..., 74 are selected or triggered based on the transscripts variations. According to the present invention, the transcription device 10 is bidirectionally connected to the network nodes or source databases 71,..., 74 via the communication network 70.

The data to be triggered based on the search terms can, as shown, be stored at different locations in different networks or locally accessible to the transcription device 10. The network nodes with the databases 71,..., 74 may include WWW (Hyper Text Transfer Protocol / WAP: Wireless Application Protocol etc.) servers, chat servers, email servers (MIME), news servers, E-journal servers, group servers or any other file servers, such as FTP (File Transfer Protocol) servers, ASD (Active Server Pages) based servers, or SQL-based servers (SQL: Structured Query Language), etc. include. By means of the transcription device, for example, elements of the first group 20 can be assigned to elements of the second group 50, wherein the assignment of the first group 20 into the second group 50 is not surjective, while the second group is assigned to the first group by means of a coding module 11 of the transcription device , where the assignment of the second group to the first group is surjective. The elements of the first group 20 and / or the second group 50 may include multimedia data such as digital data such as text, graphics, images, maps, animations, moving images, video, quicktime, sound recordings, programs (software), program accompanying data and hyperlinks or References to multimedia data. These include, for example, MPx (MP3) or MPEGx (MPEG4 or 7) standards, as defined by the Moving Picture Experts Group. In particular, elements of the first 20 and / or second 50 groups may include data in HTML (Hyper Text Markup Language), HDML (Handheld Device Markup Language), WMD (Wireless Markup Language), VRML (Virtual Reality Modeling Language), or XML (Extensible Markup Language) format include. The second group may for example be based on Cyrillic and / or Hebrew alphabet. The filling elements and / or transcription variations may include, for example, Cyrillic or Hebrew letters.

For transcriptions between Cyrillic and Latin written elements, the common direction of transliteration is the transformation of Cyrillic terms and terms for which there is no translation (ie, person and place names, etc.) into terms in Latin script. The aim of this transliteration is, e.g. To translate Russian terms into Latin characters so that the readers pronounce them phonetically correct. For this direction, numerous standards are known in the art. For example, (i) ALA-LC (American Library Association & Library of Congress): Widely used in North American publications; (ii) BGN / PCGN: Is the most common standard that leads to relatively phonetically good results for anglophone people; (iii) GOST: Developed in 1971 in the USSR and continues to evolve today. The latest version of this standard (GOST 7.79) is the official standard used in Russia and the other ex-USSR states.

In the transliteration of Cyrillic in Latin, there is usually no right or wrong, which results from the different standards. For example, Muxami TopaneB can be considered

Mikhail Gorbachev, Mikhail Gorbachev, Mikhail Gorbachev, etc. will be written. The situation is different with the transcription from Latin to Cyrillic. This direction can be used, for example, to turn English, French, German etc. names into Cyrillic. Again, this is relatively easy as there is no right or wrong. However, if originally Russian names, which are only available in the Latin form, are to be transformed back into Cyrillic, things get more difficult. Because of Russian names, there is only one correct spelling in Cyrillic. It is an advantage of the invention that the abovementioned standards (ALA-LC, BGN / PCGN, etc.) can be reversed by means of the transcription device 10 according to the invention, and finally the transliterated names can be reversed by means of databases 71,... 74, in particular Google, for example. checked for their correctness. For terms that do not exist in the databases 71, ..., 74, especially the Internet, For example, the transcription device may use one of the standard methods mentioned above. However, it is a clear advantage that if the transcription device 10 makes a transliterating proposal based on the method according to the invention by means of the databases 71, ..., 74, this is certainly the right one.

To generate the first transcription by means of definable transcription parameters of the base module for a selected element of the first group 20, the transcription device can use, for example, a combination of the two standards ALA-LC and BGN / PCGN. It is peculiar to both norms that the corresponding illustrations of the Cyrillic narrative are not injective in Latin. This means that two different Cyrillic characters can be mapped to the same Latin character. For the reversal of the figure, this means that a Latin character can produce two different cyrillic variants. Also exist in the Russian silent character (similar to the ^N r T in error), the cause consonants are pronounced softer or harder. The two silent characters V, the softer the previous constants, and ^"V, which makes the preceding consonant harder. These two characters can be taken into account by any of the Transliteratoren and transcription devices of the prior art. Only through the inventive transcription apparatus 10, for example, can Russian The most prominent example, in which all the prior art transliterators fail, is Boris Yeltsin's still Bopnc Eπbu, wi. The third letter in the surname is the "plasticizer" V.

In the transcription device 10, in a first step, for example, the text written in Latin can be translated character by character into Cyrillic characters. In doing so, a copy of the result is created for each possible branch. At the end of this process there is a notation for every theoretically possible variant due to the phonetic rules. An example can be found in FIG. 3. In the transformation of individual letters, the following criteria can be taken into account: (i) If the letter is a consonant or a vowel; (ii) If the target letter is joted (ju instead of u); (iii) Next a consonant or a vowel; (iv) If the letter is at the end or the beginning of a word; (v) If the letter is part of a letter combination that always transliterates immediately. These five criteria determine the possible transliterations. For example, they can be derived from the standards ALA-LC and BGN / PCGN. BGN / PCGN is a method by which Cyrillic terms, especially Russian expressions, can be translated into Latin terms. The procedure for Cyrillic expressions is a procedure for a wider selection of BGN / PCGN procedures (currently 29 different languages are covered by BGN / PCGN). The BGN / PCGN procedures were developed by the United States Board of Geographical Names and the Permanent Commitee on Geographical Names for British Official Use. The procedures for supporting transliterations in Cyrillic letters, especially Russian expressions, were recorded in 1944 by BGN and in 1947 by PCGN. The transliteration is based solely on the use of the capital letters and punctuation, which are on the English version of standard keyboards and keyboards. BGN / PCGB does not require any special characters, although the use of the character ( ^■ ) is permitted to avoid ambiguity. Many publications use a simplified form of BGN / PCGN, for example, to translate English into Russian terms by typically converting e to yo, simplifying -y and -yy endings to -y, and avoiding apostrophes for t and b , Edward Allworth, for example, uses a BGN / PCGN based methodology in his book "Nationalities of the Soviet East - Publications and Writing Systems." It always transfers e and e to e and e respectively and substitutes an i for y from M, K> and fi, making the procedure similar to a version of the ALA-LC system without diacritics. The following table illustrates the BGN / PCGN method with example:

There are special species regulations [examples

(Russian) , Ä3OB = ÄZOV

(A (a): A (a) None

TaiviδoB = Tambov

^' BapHayn = Barnaul

• B (6) _; B (b) No KyOaHb = Kuban '

BπassMMMp = Vladimir

IB (B) V (V) ^« No YjibfiHOBCK = Ul'yanovsk

! r (r) 'rpo3HbiPi = Groznyy

^■ G (g) None

, BoπroflOHCK = Volgodonsk

jfl3ep> KMHCKMM = Dzerzhinskiy ifl (fl) D (d) No HennflOBo = Nelidovo

1. EΠM3OBO = Yelizovo

1st word (initial);

2. after vowels; 2. HaπaβBCK = Chapayevsk;

: γe (ye) 3. after M;

! E (e) 3. Ma ^ ep = May_yer;

4. to b;

5. after t.

Ba3bMa = Vyaz'ma

_MpKyrcK = Irkutsk

[H (M) (i) None AπaTMTbi = Apatity jVora, y, bi, ora. Mainly, I uses non-Russian language non-Russian transliteration.

^■ Y- (y) Kaήaφa = Kay-afa ■ Russian-Ianguage names

\ A (M) .from the Russian «spelling. The I use of the Digraph; is optional.

Λoujκap-Oπa = Yoshkar-Ola

»Y (y) ■ All other cases: EMMCK = Biysk

^■ KnpoB = Kirov

K (κ); K (k) None

! EHHcekicK = Yeniseysk iJloMOHocoB = Lomonosov JI (Ji) L (I) [None

I ^! HennflθBθ = Nelidovo

! MeHfleπeeB = Mendeleyev

M (M) M (m)] None! KawieHKa = Kamenka jHossocHÖMpcK = Novosibirsk

I H (H) N (n) None i KaHflaπaKiua = Kandalaksha j

, OMCK = Omsk

O (o) O (o) [No KpacHOfipcK = Krasnoyarsk, π (π) πeτpo3aßθflcκ = Petrozavodsk

P (P) i None CepπyxoB = Serpukhov j POCTOB = Rostov

IP (P) R (r) None _j CeBepo6aMκaπbcκ =! Severobaykal'sk

C (c) CκoBopoflHHθ = Skovorodino

S (s) I No MaPiKOBCKMM = Chaykovskiy

^■ TaiviδoB = Tambov T (τ) τ (t) j None I MbITMIi (M = Mytishchi

[YmMH = Possible v (y) U (U) ^■ None

; flyflMHKa = Dudinka φ (φ) | ΦypMaHθB = Furmanov

F (f) (No "Yφa = Ufa

[XaβapoBCK = Khabarovsk iX ⁽ x ⁾ Kh (kh) j None ■ npoxnaflHbiM = Prokhladnyy

U404) Ts (ts): None i

= TSimlyansk ^■ EπbuiHH = Yel'tsin

He6oκcapbi = Cheboksary

M (H), Ch (Ch) No rienopa = Pechora

UJaxrepcK = Shakhtersk

LU (LU) «Sh (Sh) None MbiujKMH = Myshkin

L14enκoBθ = Shchelkovo

LH (U-O ΪShch (shch) No PTMLμeBo = Rtishchevo

This sign comes on

^" b ^" " ^■ Beginning of a word not πofli3e3flHθki = pod" yezndoy ivor.

Before a, y, bi, or a Mainly used for transliteration of non-Russian language names.

¹ Y- (y) BbiyflMTb = Vy-udit '; Russian-language names' from Russian [spelling. The use of the Digraph ■ is optional.

, After every vowel. (Mainly used

I bI (bi) j for transliteration of; Names from non-Russian languages non-Russian

IΎ Russian-Ianguage names from the Russian [spelling. The use of the Digraph is optional.

; For all other cases. ! These signs come at blTTbiK-Kenb = Yttyk-KeT

Y (y) {beginning of a word with TbiHfla = Tynda i origin in Russian i not present.

These signs do not appear on the b (b)] beginning of a word TKDMeHb = Tyumen '!

3 (3)

[After each consonant flssyxaneivieHTHbiM = ^ except M. Mainly Dvukh-elementnyy uses non-Russian-language names from non-Russian languages Russian spelling. The use of this digraph is optional.

, 3πeκτporopcκ = Elektrogorsk

E (e) All other cases Paflno3πeκτpoHHκa = Radioelectrons

K36nπefiHbiM = yubileynyy

KD (K)) I Yu (yu) ^■ None; Knκ) HeBcκaa = Klyuchevskaya

^" 5lκyτcκ = Yakutsk ifl (*) Ya (ya) No 1 Epymc" = Bryansk

Mainly used for transliteration of. Names from non-Russian languages non-Russian

ITc (TC) iT-s (ts) i Russian-language names ^■ CooτBeτcτBne = Sootvet-stviye; from Russian spelling. The uses of this; Digraph is optional.

, Mainly used for transliteration of I names from non-Russian languages non-

'Sh-ch: BecHyujHaτbiM =

LUM (LJJM) j Russian-Ianguage names i (sh-ch) from the Russian iVesnush-chatyy 'spelling. The ! Use of this digraph is optional.

For reference, see, e.g. U.S. Board on Geography Names Foreign Names Committee Staff, 1994. Romanization Systems and Roman-Script Spelling Conventions, pages 84-85ff.

The ALA-LC comprises Slavonic alphabet tables and is a set of standards for transliterating text and terms in a variety of spellings and is used primarily in North American libraries and publications. The latest version was published by the American Library Association & Library of Congress in 1997. The non-ambiguous version of the method requires diacritical and connection characters between the individual letters, which are often omitted in practice. ALA-LC also publishes transliteration tables for a wide variety of languages.

Kyrilliche

Sign Latin

Special provisions Examples (Russian i e sign

I

Ä3OB = ÄZOV

! A (a)! A (a)! No fTaMβoB = Tambov

BapHayn = Barnaul

3 (6) | B (b) No KyOaHb = Kuban C

BπaflwviMp = Vladimir

; B (B) ■ V (v) No yjibfiHOBCK = Ulciahovsk

ir (r) fpo3Hbiki = Groznyϊ

G (g) (No BonroflOHCK = Volgodonsk

fl3ep> KMHCKMM = Dzerzhinskiϊ ifl (fl) iD (d) No HennflOBO = Nelidovo

jEnn3OBO = Elizovo

^■ E (e) | E (e) No jMe6oκcapbi = Cheboksary

iE (e) EΠKMH = Elkin

E (e) (No 03βpHbiPi = Ozernyϊ

> KyκoB = Zhukov

> K (> κ) I Zh (zh) I None ifly> KHMKM = Luzhniki

3HMropofl = Zvenigorod

| 3 (3) Z (z) No Bfl3bMa = Viazüma MpKyTCK = Irkutsk

H (M); l (i) No Ana ™™ = Apatity

MoiüKap-Oπa = Ϊoshkar-Ola ß (M) ϊ (ϊ) None i BMMCK = Biϊsk

KnpoB = Kirov

, K (κ) K (k) No EHncePicK = Eniseϊsk

JΓIOMOHOCOB = Lomonosov

Jl (Ji) L (I) No HejiMflOBo = Nelidovo

MeHfleneβB = Mendeleev

M (M) M (m) No KawieHKa = Kamenka

HOBOCM6MPCK = Novosibirsk

^, H (H) IN (n) No KaHflanaκtiia = Kandalaksha

OMCK = Omsk

! θ (o) .0 (0) No KpacHoapcκ = Krasnoiarsk

πeτpo3aßθflcκ = π (π) P (p) i None _ι Petrozavodsk CepπyxoB = Serpukhov

POCTOB = Rostov

P (p) [R (r) (no CeBepo6aMκanbcκ

Severobaϊkalπsk

CκoBopoflMHθ = Skovorodino

C (c): S (s) i No HaMKOBCKMM = Chaϊkovskiϊ

TaiviδoB = Tambov

T (τ) iT (t) (No MbiTMLMM = Mytishchi

; 3neκτporopcκ = Elektrogorsk

3 (3) 'E (e) None; PaflMθ3neκτpoHnκa = i Radioelectrons

[K) 6nneiiHbiPi = lübileϊnyϊ

K) (K)) 10 (iu) None

¹ Knκ) HeBcκaa = Kliuchevskaial

Ky fl KyTCK = Kkutsk fl ⁽ a ⁾ l IA (ia) 'None <BpHHCK = Briahsk

It should be noted that in one embodiment, the automated transcription device 10 may include a control and monitoring module for controlling web engines and / or conversion devices, wherein by means of the transcription device 10 in addition source databases 71, ..., 74 become accessible. By "additionally accessible" is meant that data or databases with data in other types of writing or writing can be captured by the web engines and interpreted uniformly. As an exemplary embodiment, the selected transcriptions in a content module of the transcription device 10 can be stored accessible to a user. In order to be able to access the content module, it can be useful (for example, to offset the claimed service) to identify a specific user from the transcription device 10 by means of a user database. For example, personal identification numbers (PIN) and / or so-called smart cards can be used for identification. Smart cards normally require a card reader in the communication device. In both cases, the name or other identification of the user as well as the PIN is transmitted to the transcription device 10 or a trusted remote server. An identification module or authentication module decrypts (if necessary) and checks the PIN via the user database. Credit cards can also be used as a variant for the identification of the user. If the user uses his credit card, he can also enter his PIN. Typically, the magnetic stripe of the credit card contains the Account number and the encrypted PIN of the authorized holder, ie in this case the user. The decryption can be done directly in the card reader itself, as is common in the art. Smart cards have the advantage that they allow greater security against fraud by an additional encryption of the PIN. This encryption can be done either by a dynamic number key, which contains eg time, day or month or another algorithm. The decryption and identification does not happen in the device itself, but externally via the identification module. Another option is a smart card inserted directly into the user's communication device. The chip card can be, for example, SIM cards (Subscriber Identification Module) or smart cards, with the chip cards each being assigned a telephone number. The assignment can be made, for example via an HLR (Home Location Register) by the IRLS IMSI (International Mobile Subscriber Identification) of a phone number, for example, a MSISDN (Mobile Subscriber ISDN) is stored. This assignment then enables a unique identification of the user.

As Ausführungsbeilspiel example, the user to start the transcription device 10, transmit a transcription request for the corresponding query from a communication device via the network 70 to the transcription device 10 via a front-end. The transcription request data can be input via input elements of the communication device. The input elements may include, for example, keyboards, graphical input means (mouse, trackball, eye tracker with Virtual Retinal Display (VRD) etc.), but also IVR (Interactive Voice Response) etc. The user has the option of determining at least part of the transcription request data himself. This can happen, for example, when the user is requested by the communication device to fill out an appropriate front-end query via an interface. The front-end query may in particular include additional authentication and / or fees for the query. In the transcription device 10, for example, the transcription data request data can be checked and, if they satisfy determinable criteria, the transcription is carried out. For user-specific requirements it can make sense that, for example, a user profile is created based on user information, for example, based on the stored in the content module transcriptions and / or references to performed transcriptions by means of a repackaging module, taking into account the data of the user profile user-optimized data are generated. The user-specific optimized data can then be made available to the user in the content module of the transcription device 10, for example. It may be advantageous for a user to be assigned different user profiles allocated to different communication devices of this user. For the user profile, for example, data on user behavior can also be automatically acquired by the transcription device 10 and stored in association with the user profile.

References

10 transcription device 5 11 coding module 12 transcription module

121 trigger module

1211 - 1212 Triggered elements

122 MonteCarlo module io 123 Filter module

124 comparison module

125 Database with combination of filler elements

126 Memory unit with transcription variants and assigned increment stack

15 20 First group of elements

30 Coded transcription

31 Transcribed transcription

40 First Transcription

41 - 47 Transcriptional variants 20 45 Transcription taken over

50 Second group of elements

70 network

71, ..., 74 decentralized databases

Claims

claims

1. A method for computer-aided transcription and / or transphrasing of non-bijectively assignable elements of a first (20) and second (50) groups by means of an automated transcription device (10), characterized

in that different combinations of indexed filling elements are generated by means of the Monte Carlo module (1 12) of the transcription device (10) and stored in a database (1 15) based on the assigned index parameters,

that by means of definable transcription parameters of a

Basic module for a selected element of the first group (20) a first transcription (40) is generated, wherein the respective used transcriptional parameters are encoded according to their transcription site,

in that a multiplicity of transcription variations are generated by means of a filter module (1 13) based on the coding of the first transcription (40) and the corresponding transcription locations by variation with the combinations of indexed filling elements, each transscrition variation being associated with an incrementation stack (1 16),

a corresponding search element is generated for each transcription variation and accessed by means of a transcription device (10) via decentralized databases (71, ..., 74) via a network (70), wherein the corresponding incrementation stack (1 17) is activated by means of the trigger module (1 1 1 ) is incremented every time a search element (121 1, ..., 1212) is triggered,

that based on the accumulated incremental stacks (1 17)

Probability parameters generated and using comparison module (1 14) based on the probability parameters a specific transcription is uniquely selected.

2. The method according to claim 1, characterized in that the automated transcription device (10) comprises a control and control module for controlling web engines and / or conversion devices, wherein by means of the transcription device (10) additionally source databases (71, ..., 74) become accessible.

3. The method according to any one of claims 1 or 2, characterized in that the filling elements comprises phonetically not relevant phonograms.

4. The method according to any one of claims 1 to 3, characterized in that the filling elements comprises sense-retaining, affirmative or attenuating filling words.

5. The method according to any one of claims 1 to 4, characterized in that by means of the transcription device elements of the first group (20) elements of the second group (59) are assigned, wherein the assignment of the first group (20) in the second group (50 ) is not surjective, while by means of a coding module (1 1) of the transcription device, the second group is assigned to the first group, wherein the assignment of the second group to the first group is surjective.

6. The method according to claim 5, characterized in that the second group is based on Cyrillic alphabet.

7. The method according to claim 5, characterized in that the filling elements and / or transcription variations comprise Cyrillic or Hebrew letters.

8. The method according to any one of claims 1 to 7, characterized in that the network (70) comprises the international backbone IP network.

9. transcription device (10) for computer-aided transcription and / or transphrasing of non-bijectively assignable elements of a first (20) and second (50) groups, characterized

in that the transcription device (10) comprises a Monte Carlo module (122) for generating different combinations of indexed filler elements, the combinations being stored in a database (125) based on the associated index parameters,

the transcription device (10) comprises a base module for generating a first transcription based on definable transcription parameters, the respective ones used

Transcripton parameters are coded according to their transcription site,

in that the transcription device (10) comprises a filter module (123) by means of which, based on the coding of the first transcription and the corresponding transcription sites, a multiplicity of transcription variations can be generated by variation with the combinations of indexed fill elements, each transscrition variation being assigned to an incrementation stack (126) is

in that the transcription device (10) comprises a trigger module (121) by means of which a corresponding search element can be generated for each transcription variation, wherein databases (71, ..., 74) distributed over a network (70) by means of a network interface of the transcription device (10) and incrementally incrementing the corresponding increment stack (126) by means of trigger module (121) each time a search element (121 1, ..., 1212) is triggered,

in that by means of the transcription device (10) probability parameters can be generated based on the accumulated incrementation stacks (126) and a specific transcription can be uniquely selected by means of the comparison module (124) based on the probability parameters.

10. transcription device (10) for transcription and / or transphrasing in automated search engines and conversion devices, wherein first search terms or first search sentences (20) by means of transcription device (10) with second search terms or search sentences (50) are linked, characterized

in that the transcription device (10) comprises a base module for

Generating a first transcription based on definable transcription parameters, wherein the respective transcript parameters used are codable according to their transcription site,

in that the transcription device (10) comprises a trigger module (121) by means of which a corresponding search element can be generated for each transcription variation, wherein databases (71, ..., 74) distributed over a network (70) by means of a network interface of the transcription device (10) and the corresponding incrementation stack (126) is correspondingly incremented by means of the trigger module (121) each time a search element (121 1 -1212) is triggered,

in that probability parameters can be generated by means of the transcription device (10) based on the cumulative incrementation stacks (126) and by means of the comparison module (124) based on the Probability parameters a specific transcription is uniquely selectable.

1 1. Computer program product, which is loadable into the internal memory of a digital computer and includes software code sections, with which the steps according to one of claims 1 to 8 can be carried out when the product is running on a computer.