AUDIOVISUAL LEARNING SYSTEM
TECHNICAL FIELD
The object of the invention is an audiovisual system for learning languages intended for a production and distribution system for an audiovisual content, which system is based on subtitling of a program content with the source language and a target language. The object of the invention is also an information-forwarding unit used in the system and a method for performing the system.
PRIOR ART
The importance of knowing languages increases all the time in the unifying Europe. E- learning and expressly learning of foreign languages has become a central goal for e.g. in the EU. Nowadays, the e-learning solutions already make quite extensive use of text, graphic, animations and videos. In addition to this, the learning material can contain interactive tasks for the activation of the learner. The focus is, however, on text and the graphic e.g. because of the high production costs for production of a video. Especially, the translation needed in language learning is expensive.
Thus, usually, the learning takes place interactively by means of material brought by the teacher or material given by the teacher to the student, such as books, texts and video tapes. There are also interactive computer programs for learning foreign languages.
US-patent 5,810,599 is presented as prior art, wherein an audiovisual content is used for teaching languages. The system is an interactive language teaching tool between a computer and audiovisual playing device, in which different interactive properties have been programmed as digital data, such as questions to the student, tasks, grammatical explanations and so on, as well as functions to handle the student's answers. The system requires an audiovisual video to be used as source material, which video is put
in a computer, and a disc, to which the computer has access and which contains said digital data connected to the audiovisual data.
From the television and the video world it is known to subtitle audiovisual programs either with an origin language e.g. for people who are heard of hearing or with the intention to learn a language when the presentation language is a foreign language. Mostly, the subtitling is a translation of the original content to the target language so that the program could be understood.
Furthermore, US-patent 5,815,196 is presented as prior art, which presents a video conference system, having an automatic voice recognition and interpretation, and US- patent 6,128,434, which describes a system, wherein the spoken language and the language of the subtitling can be freely chosen from stored languages.
OBJECT OF THE INVENTION
The object of the invention is the production of systems and teaching programs that are based on subtitling and translation of audiovisual contents.
Especially, the object of the invention is to develop a language teaching system that is simpler than the methods of prior art by making use of existing audiovisual broadcasting and streaming programs, such as for example TV programs or video films or audio and video sendings presented on internet.
Furthermore, the object of the invention is to develop a language teaching system that can be published to different channels, such as DigiTV, DVD etc.
SUMMARY OF THE INVENTION
The invention is an audiovisual learning system for a foreign language working in a production and distribution channel system for audiovisual contents, which system
performs the forwarding of the information of an audiovisual content and the distribution, subtitling, translation and presentation to the end user. The distribution channel system involves modules, wherein the audio and/or visual content to be presented exists, which content is presented in a foreign language, means to convert said content into a form to be further processed, means for subtitling of the program with the original language and means for the translation of the subtitled program. The subtitled program contains a video, text and a time code for the synchronization of the text with the video. Furthermore, the system has means for presenting the ready service in a receiver device, and means, with which the information is forwarded between the different modules and furthermore there is a receiver device, wherein the final program is presented. The invention is mainly characterized in that the means for the translation of the subtitled program perform the translation and the subtitling of the translation in accordance with selected translation modes and combines the audio and the visual content and subtitlings. The receiver device has means, with which different translation modes for the subtitling of the original language can be selected, which translation modes are presented in a menu of different functions, from which the selection possibilities are selected in accordance with language, target of translation, and the appearance of the upper and the lower text in accordance with sub menus.
The object of the invention is also an information-forwarding unit used in the system and a method, which performs the system.
The production and distribution channel system of the audiovisual content is preferably a multi-channel publication system or a sc. multi-media publication system, which comprises an analog or digital video, and a distribution channel system of analog and digital TN programs for broadcasting and streaming servers.
The receiver can have means, with which different translation modes of the original subtitling can be selected and also the content provider might have selected desired translation modes in advance to be selected by the consumer. In one translation mode, the translated text consists of translations or words or phrases of the original subtitling. The places to be translated and/or the translated places can be presented with a
selected color, whereby the coloring of the translation can be installed to proceed synchronically with the voice of the audiovisual content. Furthermore, the translated words can be in basic form or in the form of the original language and the amount of translated words can be adjusted in different ways according to frequency statistics. Hereby, it can be optional to translate rare words, common words or all words. The subtitling of the starting language of the audiovisual content is preferably above the translated subtitling. The position of the upper or lower subtitling, the font size, the color and the background color can, however, be selected by the end user or by the content provider. Also the difficulty level of the translations can preferably be selected.
The text mode is selected for example by means of a menu, wherein the language selection, the object of the translation, the appearance of the upper text and the appearance of the lower text, are selected. Hereby, the alternatives of the main values are further selected by means of sub menus by selecting one or more alternatives from the sub menus.
The translation is performed automatically or semi-automatically by means of a translation or subtitling editor. The subtitling or translation editor uses ready subtitling coming with the source material with time codes. The subtitling and the timing of the translation can also be made manually.
The program-repeating device is a digital TV, CD-ROM, IP Datacasting- or Webcasting receiver, DVD receiver, an analog TN or an e-book.
The system uses voice recognition program to create a subtitling of the original language and for the synchronization of voice and text.
Wordlist programs can be used for the translation of the subtitling in the original language, whereby one or more translations are found for each word or phrase from which the most suitable expression for the sentence connection is selected from the menu.
It is possible to produce such video material by means of the invention, which for teaching use advantageously is subtitled with both the source and the target language as the translation takes place semi-automatically or automatically by means of translation and subtitling editors.
With semi-automatic translation is meant such a translation, wherein the system produces a translation to the words but a human being has to check and correct mistakes or to choose the right translation from alternatives.
The use of the invention is also very easy for the consumer as it is offered as an additional service for example in connection with a TN program. Furthermore, performed as a DVB MHP -application, the system of the invention requires less bandwidth for the sending stream from digital TV sendings than for example a normal DVB subtitling, whereby bandwidth remains for the channel companies lor sending of the services.
Furthermore, the invention enables the learning of a language in a natural way during normal life as the watcher follows normal TV-programs or videos, which lowers the threshold for studying.
In the following, the invention is presented in detail by means of figures. The intention is however not to restrict the invention to the details of the presentation because the scope of protection of the invention is defined by the patent claims.
FIGURES
Figure 1 presents generally distribution process alternatives in a multi-channel system, wherein a subtitling and translation service is produced in a raw material in accordance with the invention.
Figure 2 presents a general architecture view more in detail than figure 1 of different distribution process alternatives in a multi-channel system, wherein the invention can be used (except that the e-Book and the VHS receiver alternatives have not been shown in figure 2).
Figure 3 presents a more detailed partial figure of an alternative distribution way from the content provider to the watcher in a DVB-MHP environment (in a digital television).
Figure 4 presents an example of a translation mode to be used in the invention.
Figure 5 is a flow-scheme example of the invention and of possible alternative function modes.
DETAILED DESCRIPTION
Figure 1 presents generally production and distribution process alternatives in a multichannel system, wherein a subtitling and translation service is produced to a raw material in accordance with the invention. The translation service of the invention can be used to different source materials of which digital TV programs are one form 1 , that can be sent via a satellite, land-sending or cables. Thus, the multi-channel property means that the one and same content can be used in different terminals, such as in a mobile phone, in a computer connected to internet, in a digital television etc. Simply presented, some kind of a video or audio content 1 - 3 (see figure 2) is needed in the invention in accordance with figure 1 , which content is subtitled with the original language. A pure audio is not suitable for all distribution channels but is suitable for such applications as for example voice books, websiting radio or music. A subtitled video or audio is foreseen with a translation in accordance with different translation modes of the invention in step 6 and is sent, depending on sending form, for distribution 4 - 5 (see figure 2) in accordance with different possibilities. Thus, the program can be received in different forms, such as an e-book 16, in IP Datacasting
form (IPDC) 17, via internet 13, as a DVD video 14, as a CD-ROM 15, in a digital TV receiver 11 , in an analog TV 12 or as a VHS video 18.
The digital video 2 is another channel form, with which the material can e.g. be brought for example through internet by Web Streaming. Web Streaming is a technology, by means of which you can watch video and listen at the radio directly from an internet server. In the Finnish language, also the name "suoratoisto" ("direct repeating") is used, as it is based on package sending technology, with which the program can be presented along with arrival of the digital video or voice packages.
The video and streaming technologies are different. The three most common technologies are the RealMedia-, QuickTime- and WindowsMedia-technologies. Different technologies necessarily do not fit together, for example QuickTimePlayer usually does not fit together with RealMedia or WindowsMediaStream files.
The streaming technology consists of hardware and software components, which work together to create media files, which work together for creating, storing and distribution of media files in the network. Each streaming technology has its three components, which influence on the use, in other words servers and media files, plug-ins and coding means.
Each streaming technology has its own server as its first component and its own media file type. Thus QuickTime, RealMedia and WindowsMedia each have their own servers which are designed to repeat files most preferably in its own form. The media files therefore have to fulfill the file format requirements of the server.
There is possibly needed another component, a plug-in component, for the web- browser, for example MediaPlayer, to watch media of a given kind.
The third component is the creation and encoding of the file, in other words, the way in which the media files are created in a given format. Each technology has its own way to encode media files for presentation from the Streaming Server. Therefore, there is
needed a program of a given kind during the development of the media to convert files to the desired streaming format.
The bandwidth is an important thing in media sending. For example voices and video files need a high data speed and much bandwidth. For example one second of video presentation for television presentation requires ca 240 000 kb/ps or 30 megabits/ second bandwidth to the computer. To prevent stocking of the channels, different solutions have been developed. The bandwidth can be increased but it is always not technically possible.
Another solution is to do the media files smaller so that they more easily would pass the narrow files in the internet. Big media files are therefor made smaller and are encoded for internet. One way is to decrease the height and breadth of the display.
A third way is packaging. All media files that are sent to internet are packaged in some extent. After the packaging it is usually not possible to take out the whole file any more and the quality decreases a little. Generally the quality decreases the more the file is packaged.
When the video file is watched through internet, the web browser and a possible plug- in component thus communicate with the Streaming Server. The server sends data evenly over the network to the computer. This takes place along with the proceeding of the video. The computer has to have a browser with appropriate Player or plug-in installed. These can be achieved free of price from internet. The plug-in works together with the browser for the presentation of files encoded in Stream in a given format. The web browser stores web pages or HTML files. The StreamingMedia-files are normally in a special Streaming Server meant for this purpose.
A third form, with which the source material program can be received is the analog video 3. The source material programs are sent through a work server to the Broadcast Server 4 or, when it is question about a digital video, to the Streaming
Server 5 in its entity. The intention with the invention is to perform the translation of the
starting material subtitled with the original language and the other treatments belonging to the invention in module 6, wherein the translation of the original text takes place and the treatment in accordance with the different modes of the invention.
So that services could be added, the starting material first has to be treated in modules 7, 8 and 9 by transcoding, coding and digitizing, depending on in which form the starting material originally was and for what kind of a terminal the video is intended to be sent. DigiTV sendings, which are sent for example with the TMEG2-standard, might be transcoded again in module 7, if a video material of a different standard is wanted. This is, however, not necessary as the subtitling of the program also can be made for a video of MPEG-2 standard.
MPEG-2 is a common encoding method for video and audio-information and is used in the DVB MHP-standard. Additional information can be found on the internet page http://mpeg.telecomitalialab.com/standards/mpeg-2/mpeg-2.htm. MPEG is short for the words Moving Picture Expert Group and is a name of a standard family, which is used for encoding of audiovisual information (e.g. movies, video, music) in a digitally packaged format. The main advantage with MPEG compared to other video- and audio-coding formats is that the MPEG files are much smaller but still have the same quality. MPEG uses very sophisticated packaging techniques.
Recast has to be performed for a digital video in module 8 and on analog video has to be digitized in module 9 so that further treatment would be possible.
Voice recognition is performed in module 10, with which the original program can be subtitled in module 10. The Speech Recognition system (for example DREmedia Auto Suite) is unindependent from the video format (MPEG, QuickTime etc.). Classic subtitling editors are not necessarily needed when using a time code for the original subtitling and the voice-identification. The video format is selected in accordance with distribution channel and terminal. A commercial program can be used for the subtitling, such as e.g. Polyscipt 3000, with which subtitlings can be created, edited and timed for the television, movies and video programs. In alternative, the subtitling can be
performed manually by means of a human being. The module 10 in question then sends the text, video and time code with which the speech is synchronized with the subtitling, translation and text editor 6.
The translation in accordance with the invention is performed in the editor 6 as well as the addition of other services, which are possible to perform in accordance with different translation modes, which are discussed later in connection with figures 3 and 4. Thereafter, the content is sent to the Content Manager, for example to the broadcast server 4 or Streaming Server 5, from which the program and the services added in that are sent further to the receiver of the consumer, which e.g. can be a digital TV 11 , an analog TN 12, a computer, which make it possible to receive videos through internet, a DVD receiver or a PC 15.
When it is question about publication of an e-book, there is anown workstation for this purpose, which is called e-Book Authoring. This is not shown in the figure. In a corresponding way, a program to CD-ROM comes through a DVD CR-ROM Authoring Server (not shown in the figure). Correspondingly, a program to a DVD receiver is sent through a DVD CD-ROM Authoring work station (not shown). An internet receiver receives its programs from the Streaming Server and the IP Data casting presentations. An analog TN receives its programs from the Broadcast Server or from an analog video. A digital TN receives its programs from the Broadcast Server. The program and the services added to that is sent further from the server to the receiver of the consumer which can be for example a digital TN 11 , an analog TN12, a PC that receives videos in HTML form, a DVD receiver or a PC 15.
In the sending, files, such as the subtitling, translations, time codes, are combined with the analog video. The DVD disc is a standard with a storing capacity of more than 4 GB, while for example 700 MB can be put in a CD-ROM. DVD discs thus have six times more capacity compared with CD-ROM. DVD uses the MPEG-2-standard and the quality can thus be compared to a Broadcast image.
In figure 2, which is a detailed partial figure of an alternative of figure 1 , the invention has been described in a working digi TV environment, wherein the invention can be added as a added value service to a TV channel. The service can be connected in the TV channel to given programs, such as to the news, documents, movies or any other program.
The Digital Video Broadcasting (DVB) standard provides tools and mechanisms for fitting together different networks, devices and systems so that services based on content and the service content itself could be delivered to the consumer. The DVB standards are published by the European Telecommunication Standards (ETSI) and ETSI, the Centre for Electrotechnical Standards (CENELEC) and the European Broadcasting Union (EBU) are co-operating for making DVB standards.
The Multimedia Home Platform (MHP) defines the general interface between interactive digital applications and terminals, wherein these applications are used. MHP is used together with the DVB standard for Broadcast and interactive services in all sending networks, including satellite, cable and land sendings.
Detailed information of DVB standards can be found in internet on page www.dvb.org and of MHP standards from the page www.mhp.org.
In figure 3, there has been presented an example of a distribution way from the content provider to the watcher in a digital television according to the DVB-MHP environment. The content provider's tool has been presented with reference number 21 for the production of value added services ,such as subtitling of an original program. The DVB MHP architecture enables the distribution of a content produced by content providers outside TV-channels, through the TV network all the way to the digital TV terminals of homes. The subtitling is sent through the network 22 to a content manager 23, which for example can be a television channel company or a service managed by a TV production company according to figure 1 , and thereafter to the object-caroselle 24 to a television channel company or network manager (in Finland: Digita), which combines the application of the original program, in other words, according to the invention, to
the subtitling and a translation service. The data that is connected to the program, a CS subtitling in ascii form, is a successive file that defines the program time of the subtitling and the position of the text on the screen and the color of the font and a frequency number for each word of the source language.
The final program, in which the translation of the invention thus is included as well as other services, are then sent for example through a multiplexer MUX 5 and the network 6 to a digital receiver 7. Multiplexing is used to decrease sending costs. Hereby, several channels use the same sending route, for example an optic fiber. MUX is thus a common interface for services that are sent to digital receivers. Also other audiovideo services are sent through MUX25. Reference numbers 26 - 28 show more generally that services are sent from the content provider 9 through an operator network or a general sending network 10 to the consumer 11.
Figure 4 presents an example of how the original subtitling of the audiovisual program in the invention is translated according to the first embodiment of the invention. Thus, the upper subtitling of reference-number 31 corresponds to the spoken, heard or sung content subtitled. In figure 4, the lower subtitling 32 is in this embodiment a word translation of the upper text in a language chosen by the watcher itself. Reference number 33 presents how the words are concentrated beneath each other according to the translation. Reference number 34 presents the words in the basic form in the translation. Reference number 35 also presents the possibility to a time-synchronized coloring to facilitate the following of the spoken language.
In this embodiment of the invention there are four describing properties:
1. The upper subtitling is in the program made from the spoken speech with the original language
2. The lower subtitling is a word or phrase translation of the upper text
3. The translated words are directed under the translated words 4. The words are in their basic form in the translation
In the different embodiments of the invention, the watcher can choose the translated language and for example the level of difficulty of the translations in the desired way. In the programming of the level of difficulty, so called frequency statistics / word lists of languages are made use of. Thus, the most common words of a language also appear most often and thus also appear in the "easy" mode, whereas the more difficult words are more rare and are therefore shown in the "difficult" mode, whereby the words with a common frequency are not shown. The more common words can for example be left out step by step: from the beginner's translation, all words that are below, for example, the first 1500 words of the frequency list of words, are left out, from the intermediate's translation the first 3500 words of the frequency list of words are left out, and a frequency number below 4500 in case of a person with a fluent capacity in a given language can also be chosen by the watcher himself.
A "caraoke"-type coloring can be used for a better targeted look, in which the words, that are synchronized with the voice, are colored. Also the traditional caraoke-type proceeding of the text can be used. The lower text, i.e. the translation, can have a different color than the upper one, different words can have different colors etc. The exploiting channels for the invention are in the first hand the digital TV, Web Casting, DVDs, CD-ROMs and also VHS tapes as well as the analog television.
Figure 5 presents a flowchart of how the invention works from the point of view of the consumer. In this case it is assumed that the service is a chargeable added service to the TV program. Naturally, the invention can also be arranged so that it belongs to the normal services of the channel.
In this embodiment, the watcher starts by choosing the program to be watched, for example from the digital television program services, in step 1. Thereafter, the watcher is offered a possibility to use the translation service of the invention, in step 2.
The working of translation service of the invention can be arranged in different ways. The translation service can be programmed so that the service provider has preselected different functions in a menu, when the consumer has more or less
possibilities to select the details of the service. In this example it is, however, assumed that the watcher now selects, from menu 1 , how the translation services should operate. In the menu it is then possible, in accordance with the different main menus, to choose the language, as well as how the original text is translated and the appearance of the upper and the lower text.
In figure 5, reference number 3b represents the selection possibilities of the watcher, i.e. the menu. Thus, at first the selection of language is made according to submenu 1 , whereby the watcher chooses that the original program shall be translated for example into English, Finnish or Swedish etc. Available languages are the ones the service provider has linked to the service. The service provider can then let the watcher choose from these language alternatives. In the future, for example in Finland, English, Finnish, Swedish and possibly some other languages could be offered as the target languages, and for example in France French and English could be offered etc. in accordance with the official languages and the spoken languages in the country of concern.
One of the big advantages of the invention is that it works as a language learning tool, whereby the translation is not necessarily made traditionally by translating all the content in detail, but by translating, for example, only the unusual words if the user ha language skills, or if a person wants to learn the basics it could be chosen that only the most common words are translated. Also such a mode could be possible to choose, in which the words of the translated texts are in basic form. The submenu 2 can, for example, be organized so that one or more alternatives can be chosen.
Furthermore, in figure 5, there is presented a submenu 3 that could be an added service in the translation service of the invention, or more advantageously, a default value, that the watcher himself can take off. Thus, an indicator could be chosen for the proceeding of the speech, which could appear, as in caraoke, in the upper text, for example by coloring the text synchronically with its proceeding or by showing the text synchronically with the speech. In addition, the wanted color, size, font etc. could be
chosen to the translated text. The same selections could be made for the lower text in the submenu 4.
When the watcher has chosen the wanted translation mode, the invention can be arranged for example so that the watcher pays for the use of the invention in step 4, for example, by means of the services of a normal mobile phone, digital television or by means other (computer)payment systems, or the service can also be free for the watcher. The payment can also take place before the selection of the translation mode. After this the program can start (step 5).
The tools, the so called DataContainer-unit, can be seen as the fundamental unit of the information forwarding of the invention, with which the information is mediated between the different modules of the production and distribution system of the invention (which is a multimedia publishing system, also called multi channel publishing system). The tools consist of elements for each word of the audio and/or visual content. These elements are connected on ground of a time code and are in a time dependent order starting from the starting point of the audio and/or visual content. The elements consist of the information keeping components into which information and metadata is created in the different steps of the information forwarding process.
The information forwarding process of the invention starts from the "raw material", or start information, of the audio and/or visual content. The DataContainer stores the information formed during the performance of the different steps, which information includes the start information, as well as information and/or metadata that are accomplished by the actions and created to the information keeping components.
The DataContainer is created in the beginning of the process, including thus only the information keeping components created for the information and metadata. In formation is created in these components as the process proceeds, for example information of the word in question, its frequency, time code, color, font etc.
The elements of the DataContainer are delivered to the terminal in a transfer-way dependent way, of which one example is XML or some of its dialects. The following is an example of a possible structure of the DataContainer.
EXAMPLE
In the example the DataContainer elements are marked with <term>.
The <context_subtitling base="en"> mark describes the service entity of the invention.
<?xml version="1.0" encoding="iso-8859-1"?> <!-edited with XML Spy v 4.4 U (http://www.xmlspy.com) by Matti Vuorio (Icareus Ltd) — >
<context_subtitling base="en"> <term> <!-lang source language word translated in target language(s)-> <base>Welcome! </base> <!-confidence in the world (out of 1000)-> <confidence>8821 </confidence> <!-time_code is the time of the start of the word from the beginning of the video file in ms~>
<time_code>45865</time_code> <!-duration is the length of the word in ms-> <duration>400</duration> <!-the frequency order of the word in frequency dictionary-
> <word_frequency>3450</word_frequency> <!-Base is the word in the source language-> <lang id="de">Willkommen!</lang> <lang id="fr">Bienvenue!</lang> <lang id ="fi">Tervetuloa!</lang>
</term> </term>
<!--word in the source language--> <base> Please</base> <!-confidence in the word (out of 1000)--> <confidence>9400</confidence> <!-time_code is the time of the start of the word from the beginning of the video file in ms-> <time_code>46215</time_code> <!--duration is the length of the word in ms--> <duration>560</duration> <!--the frequency order of the word in frequency dictionary-
<word_frequency>469</word_frequency> <!-lang source language word tranlated in target language(s)--> ■clang id="de">Bitte</lang> <lang id="fr">S'il vous platt</lang> <lang id="fi">Olkaa hyva</lang>
</term> </context_s u btitl i n g>
Excluding the information in the DataContainer, the metadata is always presented in a terminal-dependent way, yet always in two lines in a way chosen by the customer.