NL2006294C2 - Website translator, system, and method. - Google Patents
Website translator, system, and method. Download PDFInfo
- Publication number
- NL2006294C2 NL2006294C2 NL2006294A NL2006294A NL2006294C2 NL 2006294 C2 NL2006294 C2 NL 2006294C2 NL 2006294 A NL2006294 A NL 2006294A NL 2006294 A NL2006294 A NL 2006294A NL 2006294 C2 NL2006294 C2 NL 2006294C2
- Authority
- NL
- Netherlands
- Prior art keywords
- information
- website
- translated
- translator
- translation
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Description
Website translator, system, and method
The present invention is related a website translator, system, and method for providing a translated version of a 5 webpage of a website in response to a HyperText Transfer
Protocol (HTTP) request from a client computer, wherein the website is hosted on a first server having a first host name .
Translators are known that are incorporated in the web 10 browser. A user wanting to have a specific webpage of a website translated first has to visit the webpage, which then is still in an undesired language. The user may then operate a translate button or other function to instruct the browser to translate any text content in the currently 15 displayed webpage. Similarly, a script may be run which consults a remote computer translator to provide translations of the various text strings in the webpage.
The present invention provides an improved website translator. According to the present invention, the website 20 translator comprises a first receiving unit arranged for receiving the HTTP request, and an extracting unit arranged for extracting a language identifier from the HTTP request and/or from an Internet Protocol (IP) address of the client computer, the language identifier corresponding to a 25 language used and/or desired by a user of the client computer. The website translator further comprises a forwarding unit for forwarding the HTTP request to the first server, and a second receiving unit arranged for receiving HTML information from the first server in response to the 30 HTTP request. A modifying unit is then used for modifying the received HTML information by replacing information to be translated in the received HTML information with a translated version thereof in correspondence with the used 2 and/or desired language. Finally, the modified HTML information is sent to the client computer for display as the translated version of the website using a sending unit.
According to the present invention, the HTTP request is 5 not directed to the first server on which the website is hosted. Instead, the HTTP request is directed to the website translator which is arranged in between the user and the first server.
A HTTP request comprises a host name of the server to 10 which the HMTL request is directed as well as specifics of the web browser and/or computer system of the client computer from which the HTTP request originates. Examples are the IP address of the client computer, the web browser that is used, and the language that is used on the client 15 computer. Part of this information can for instance be extracted from the browser settings and/or operating system settings, which are available in the HTTP request. The IP address is a form of location information enabling the geographical location of the user to be determined. Within 20 the context of the present invention, both the IP address and the information regarding the language that is used on the client computer are examples of a language identifier as they both allow a language used and/or desired by the user of the client computer to be determined.
25 The website translator forwards the HTTP request to the first server which hosts the website to be translated. In this case, the website translator acts as a client requesting a webpage from the first server. However, the HTML information that is received in return, e.g. a HTML 30 page, is not relayed directly to the user. Instead, relevant information is extracted from the HTML information, e.g. text portions between HTML tags, and is subsequently replaced by translated versions thereof. This is 3 advantageous because the layout of the website can remain the same. Finally, the modified HTML information is sent to the user.
According to the present invention, the user need not 5 have any knowledge about the location of the first server. Moreover, as the translation is done real-time there is no need to store a copy of the translated website or webpage. Because the language in which the webpage is presented to the user always corresponds to the language settings already 10 used and/or desired, the user is not first confronted with a webpage in a language he cannot understand.
According to the present invention, several sources of information can be used to determine the language used and/or desired by the user. It is possible to offer the user 15 an option if these sources provide multiple languages. In such case, a script can be run on the website translator establishing communication between the user and the website translator prompting the user to select between several options. The present invention may even be modified such 20 that the extracting unit no longer extracts the desired information from the HTTP request but that it will always determine this information using input from the user as described above.
In an embodiment of the present invention, the website 25 translator further comprises a server database having stored therein a correlation between a host name extracted from the HTTP request and a host name of the first server. The extracting unit is then arranged to extract a host name from the HTTP request, whereas the forwarding unit is arranged to 30 forward the HTTP request to the first server having a host name that correlates with the host name extracted from the HTTP request. For instance, the first server could have the host name "original.company.com", whereas the host name of 4 the website translator is "company.com". In this case, the server database comprises the correlation between "original.company.com" and "company.com". The HTTP request comprises the host name "company.com". By using the 5 information in the server database, the forwarding unit is able to send the HTTP request from the website translator to the first server having the host name "original.company.com" on which the website to be translated is hosted.
It should be noted that within the present invention, 10 host name should be interpreted as an Internet host name that represents a domain name assigned to a host computer.
In a further embodiment of the present invention, the website translator has an Internet Protocol (IP) address linked to a plurality of host names, and the server database 15 comprises a correlation for each of the host names with a host name of a respective first server.
The use of multiple host names for a single IP address allows the website translator to be operative for multiple websites at the same time. For instance, the website 20 translator could be configured to provide translations of a webpage "page 1" hosted on a first server having host name "original.comanyl" corresponding to "company 1" and of a webpage "page 2" hosted on a first server having host name "original.company2" corresponding to "company 2". In this 25 case, two host names could be attributed to the IP address of the website translator, e.g. "companyl.com" and "company2.com". The server database then comprises a correlation between "companyl.com" and "original.comanyl" and between "company2.com" and "original.comany2". A HMTL 30 request for "page 1" from the client computer is then forwarded to the first server "original.comanyl" where this webpage is generated or stored.
5
In an embodiment of the present invention, the modifying unit is arranged for extracting text content in the HTTP request. For instance, the information to be translated could comprise a word, a phrase or part thereof, 5 or a paragraph of the extracted text content. Typically, in a webpage text content is embedded in between HTML tags. By scanning a HTML document, the relevant text content can be filtered out for instance based on the HTML tags.
In an embodiment of the present invention, the 10 modifying unit is arranged for extracting a link to media content in the HTML information and to replace the link with a link to a translated version of the media content. The translated version of the media content can be stored on a server different from the first server. It may reside on a 15 dedicated server within the website translator or group of website translator. Such server may even comprise the server database and the content server database, which is discussed later.
In an embodiment of the present invention, the website 20 translator further comprises a computer translator for providing the translated version as a computer translation of the information to be translated.
Computer translators are known. Within the context of the present invention, a computer translator is construed as 25 an automated translation apparatus which provides a translation of inputted text into a desired language. According to the present invention, this language is determined from the HTLM request and/or the IP address of the client computer.
30 In an embodiment of the present invention, the website translator further comprises a content database having stored therein a correlation between information to be translated and a translation of this information. For 6 instance, the content database could comprise the correlation between "Hello" (for English) and "Bonjour" (for French) and "Hallo" (for Dutch). As such, the content base need not be restricted to a correlation between two 5 languages. Moreover, the content database may include correlations between entire phrases, e.g. between "How are you" (for English), "Comment allez-vous" (for French), and "Hoe gaat het met u" (for Dutch).
In an embodiment of the present invention, the 10 modifying unit is arranged to replace the information to be translated in the HTML information with the translation of this information if this translation is available in the content database. Hence, upon receiving the HTML information from the first server, the HTML information, e.g. a HTML 15 page, is scanned for text content. Subseguently, a translation for the text content is taken from the content database. It is noted that multiple translations are needed as a single HTML page normally contains several pieces of text content.
20 However, it may happen that a translation is not available, for instance because the original website hosted on the first server has been changed. In such case, it is advantageous if the modifying unit is arranged to control the computer translator to provide a computer translation of 25 the information to be translated and to replace the information to be translated in the HTML information with the computer translation of this information if the content database does not comprise a translation of this information. This ensures that a translation is always 30 provided for.
In an embodiment of the present invention, the website translator comprises a content database management unit arranged for storing a computer translation of information 7 to be translated in the content database. Hence, if a translation is not available in the content database, the generated computer translation can be stored in the content database for future use.
5 In an embodiment of the present invention, the content database management unit is arranged for storing an externally received translation of the information to be translated in the content database. Examples of such translations can be translations that are not generated by a 10 computer but by a human translator. It is even possible that the content database comprises both the computer translation and the human translation. Attributes can be used to distinguish between these translations, as will be discussed later.
15 In an embodiment of the present invention, the content database comprises a plurality of entries, each entry comprising a correlation between a hash of the information to be translated and at least one translation of this information. Here, the content database management unit is 20 preferably arranged to hash the information to be translated by applying a predefined hash function to this information and to correlate the hashed information with the at least one translation of this information.
By applying a hashing function to a text segment, a 25 hash is generated, for instance in the form of a hexadecimal number which is unique for that text segment. By using hash functions, the content database can easily be searched and constructed.
In an embodiment of the present invention, the content 30 database management unit is further arranged to assign an attribute to the translation of the information to be translated. The attribute allows different translations for the same information to be translated, e.g. a text segment, 8 to be distinguished. Furthermore, the attributes may contain information regarding the translation, such as status, origin, language, and quality.
The content database preferably comprises a correlation 5 between information to be translated and at least two translations thereof, wherein the modifying unit is arranged to select one of said at least two translations based on the attributes assigned to the at least two translations. For instance, the modifying unit may select a particular 10 translation because it has the attribute "EN" (for English) where an English translation is needed. The attribute can therefore be a country or language code, and the modifying unit can be arranged to select one of said at least two translations based on the extracted language identifier, 15 such as a language code and/or IP address.
In an embodiment of the present invention, the modifying unit is arranged for modifying the received HTML information by replacing Cascaded Style Sheet (CSS) information with different CSS information corresponding to 20 the used and/or desired language. Apart from text content, a webpage contains layout information. This information can be incorporated in as in-line CSS statements or the HTML code may comprise a link to a CSS file stored for instance in the first server. According to the present invention, this 25 layout information can be replaced with layout information specific for the desired and/or used language. The information can be stored for instance on the content server.
Changing the layout information in dependence of the 30 desired and/or used language allows for a different layout to be used for a given language. This is particular useful when changing between languages which are completely different from each other, or for which a different reading 9 direction must be used. For instance, a webpage optimized for the English language may not be useful for presenting Chinese text and vice-versa. By changing the layout this problem can be obviated.
5 The present invention also provides a website translator system, comprising a plurality of website translators as previously described. At least one of the content database, the content database management unit, the computer translator, and the server database is preferably 10 arranged as a central unit common to each of the plurality of website translators. This allows multiple website translators to use a common resource. This is particularly advantageous for the content database. As the network of website translators grows, so does the content in the 15 content database. This allows the modifying unit to select a translation from the content database in more cases and for more languages instead of having to resort to a computer translation. Moreover, selecting a translation from the content database will in most cases prove to be a faster 20 solution than providing a computer translation.
The present invention also provides a method for providing a translated version of a webpage of a website in response to a HyperText Transfer Protocol (HTTP) request from a client computer, wherein the website is hosted on a 25 first server having a first host name. The method comprises the following subsequent steps: a. receiving the HTTP request; b. extracting a language identifier from the HTTP request and/or from an Internet Protocol (IP) address of the 30 client computer, the language identifier corresponding to a language used and/or desired by a user of the client computer; c. forwarding the HTTP request to the first server; 10 d. receiving HTML information from the first server in response to the HTTP request;
e. modifying the received HTML information by replacing information to be translated in the received HTML
5 information with a translated version thereof in correspondence with the used and/or desired language; f. sending the modified HTML information to the client computer for display as the translated version of the webpage of the website.
10 The method could further comprise providing a content database having stored therein a correlation between information to be translated and a translation of this information. The replacing of information to be translated then comprises: 15 if a translation of the information to be translated is available in the content database, replacing the information to be translated with this translation; if a translation of the information to be translated is not available in the content database, providing a computer 20 translation of this information and replacing this information with the computer translation.
Next, the invention will be described in more detail under reference to the accompanying drawings, wherein:
Figure 1 illustrates an embodiment of a website 25 translator according to the present invention; and
Figure 2 shows a network comprising a plurality of website translators according to the present invention.
Figure 1 shows an embodiment of a website translator according to the present invention. A user may operate a 30 client computer 1 to send a HTTP request to website
translator 2. Website translator 2 receives the HTTP request using a first receiving unit 3. The HTTP (HTTP/1.1) request typically comprises a request line, such as "GET
11
/home/index.html HTTP/1.1" and a host header such as "host: www.example.com". This HTTP request would request the webpage www.example.com/home/index.html from a server having the host name www.example.com. Website translator 2 is 5 therefore able to extract the host name from the HTTP
request. Furthermore, website translator 2 has an Internet Protocol (IP) address acting as the physical address the HTTP request is sent to. It should be noted that multiple host names can be attributed to a single IP address.
10 Next, a language identifier is extracted by extracting unit 4 from the HTTP request which corresponds to a language used and/or desired by the user. An example is the "accept-language" header field in the HTTP request. However, also the IP address of client computer 1 may be used as this 15 address is related to the geographical location of client computer 1, thereby giving another indication of the language desired and/or used by the user.
Website translator 2 comprises a server database 5 having stored therein a relation between the host name 20 extracted from the HTTP request and the host name of the server which hosts the website requested by the user. For instance, such relation could be "example.com" and "original.example.com", meaning that a user requesting a webpage on the server with the host name "example.com" is in 25 fact requesting a translated version of a webpage of a website hosted on "original.example.com".
Forwarding unit 6 uses server database 5 to forward the received HTTP request to first server 7 based on the host name that is correlated with the host name extracted from 30 the received HTTP request.
In response, first server 7 will send HTML information, e.g. in the form of a HTML page, to second receiving unit 8 of website translator 2.
12
Typically, a HTML page comprises a plurality of HTML elements. These elements comprise a pair of tags, a start tag and an end tag, as well as some element attributes within the start tag and textual or graphical content 5 between the start and end tags.
Website translator 2 comprises a modifying unit 9 for modifying the received HTML information. It does so by scanning the received HTML information, e.g. the HTML page, looking for tags and to extract the content in between the 10 various start and end tags. If text content is found, modifying unit will consult content database 10 to look for a translation of the extracted text content into the desired and/or used language of the user. Modifying unit 9 will then replace the original text content in the HTML information 15 with the translated version thereof. Consequently, any text content in the requested webpage is translated. Possible forms of text content are single words, phrases or parts thereof, paragraphs, or even entire pages. It should however be noted that it is advantageous to use moderately large 20 units, e.g. phrases or parts thereof, to facilitate the reuse of the translation. On the one hand, parts should be large enough to enable a context to be established between neighboring words, whereas the parts should not be too large as this would severely limit the chance that another webpage 25 comprises an identical piece of text.
In addition to text, a webpage may comprise a link to a media file, such as a picture. This picture may contain text information as well. Modifying unit 9 can therefore be arranged to extract the links to these media from the HTML 30 information, and to replace the links with other links that point to corresponding media files, albeit in a different language. These translated versions of these media need not be located on first server 7.
13
Next, a sending unit 11 sends the modified HTML information to the user's computer 1 for display as the translated version of the website.
Website translator 2 may comprise a computer translator 5 12 for providing the translations of the text content.
Website translator 2 may be arranged to retrieve the required translation directly from computer translator 12 instead of accessing content database 10. However, it is advantageous if the modifying unit 11 first inspects content 10 database 10 to look for a translation and to use such translation if it is present. However, in case such translation is not present, computer translator 12 may provide one.
Website translator 2 may comprise a content database 15 management unit 13 to manage the addition and deletion of translations into content database 10 as well as the access to it. To that end, it may receive translations for text content from one or more human translators 14 and/or from computer translator 12.
20 In order to organize content database 10, content database management unit 13 applies a hashing function to the information to be translated. For instance, the word "hello" becomes the hash 0xl245fd4e5, being a hexadecimal number. In content database 10, this hash is linked to a 25 translation. Whenever modifying unit 11 needs a translation of a piece of text, it will consult content database 10 via content database management unit 13, wherein the content database management unit 13 will first hash the information to be translated and subsequently look for a corresponding 30 translation in content database 10. If such translation is found, that translation is returned to modifying unit 11. In case such translation is not found, computer translator 12 is consulted to provide one. This translation may 14 subsequently be linked by the content database management unit 13 to the hash of the information to be translated to enable future use.
As shown in figure 1, several sources of translations 5 exist. Moreover, for each piece of information to be translated, several translations in several languages may exist. For instance, an English phrase may have a corresponding translation in Dutch as well as three translations in Spanish. Of these three translations in 10 Spanish, two are generated by a person, whereas one is generated by computer translator 12. To distinguish between these translations, attributes may be assigned to them. For instance, the Spanish translation may be contain the following attributes [C/H] for human or computer 15 translation, [Am, Eu] for South American Spanish or European Spanish, [H/M/L] for setting a level of quality ranging from (h)igh, to (m)edium, to (l)ow. Modifying unit 11 can be arranged to select one of the available translations based on the attributes assigned to them. For instance, if the IP 20 address of client computer 1 reveals that the user is located in Chili, modifying unit 11 will use the South American Spanish translation.
The website translator according to the present invention is particularly well suited to handle dynamic 25 content, i.e. a webpage for which the text content changes often. Normally, when text content is extracted during the modification process, it is determined whether a translation is available in content database 10. In case a translation is not available, for instance because the content of the 30 original webpage has changed, the system can be configured to provide a computer translation. However, in addition a request to a human translator may be issued. At a later stage, the translation provided by translator can be used in 15 addition or instead of the computer translation. In this way, the translated version of the webpage is always synchronous with the original webpage.
The present invention can easily be scaled, 5 wherein a plurality of website translators 2 are arranged to each provide a translation of at least one website hosted on different first servers. Such network is illustrated in figure 2. Each of the two depicted website translators 2 is arranged to provide a translation of two different websites 10 that are hosted on two different first servers. Again, the information in server database is used to correctly ascertain which HTML information is requested.
As the website translators 2 all serve the same purpose, it is advantageous to combine common components.
15 For instance, content database, content database management unit and server database can be arranged as common central units accessible for each website translator. Such arrangement allows the various website translator to benefit from each other. Such an approach will allow a website 20 translator to use a translation for a given piece of text content that was originally translated for a different website translator. Such arrangement also reduces a centralized approach for modifying and maintaining the various databases.
25 Although the invention has been described in detail, it should be obvious to the skilled person in the art that various modifications are possible to the embodiments shown without departing from the scope of the invention which is defined by the appended claims.
30
Claims (21)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NL2006294A NL2006294C2 (en) | 2011-02-24 | 2011-02-24 | Website translator, system, and method. |
PCT/NL2012/000016 WO2012115507A1 (en) | 2011-02-24 | 2012-02-24 | Website translator, system and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NL2006294A NL2006294C2 (en) | 2011-02-24 | 2011-02-24 | Website translator, system, and method. |
NL2006294 | 2011-02-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
NL2006294C2 true NL2006294C2 (en) | 2012-08-27 |
Family
ID=44514325
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
NL2006294A NL2006294C2 (en) | 2011-02-24 | 2011-02-24 | Website translator, system, and method. |
Country Status (2)
Country | Link |
---|---|
NL (1) | NL2006294C2 (en) |
WO (1) | WO2012115507A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10296968B2 (en) | 2012-12-07 | 2019-05-21 | United Parcel Service Of America, Inc. | Website augmentation including conversion of regional content |
US20150261880A1 (en) * | 2014-03-15 | 2015-09-17 | Google Inc. | Techniques for translating user interfaces of web-based applications |
US9965466B2 (en) | 2014-07-16 | 2018-05-08 | United Parcel Service Of America, Inc. | Language content translation |
SG11201808470QA (en) | 2016-04-04 | 2018-10-30 | Wovn Technologies Inc | Translation system |
US11373048B2 (en) | 2019-09-11 | 2022-06-28 | International Business Machines Corporation | Translation of multi-format embedded files |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040167784A1 (en) * | 2003-02-21 | 2004-08-26 | Motionpoint Corporation | Dynamic language translation of web site content |
US20090157381A1 (en) * | 2007-12-12 | 2009-06-18 | Microsoft Corporation | Web translation provider |
US20090192783A1 (en) * | 2008-01-25 | 2009-07-30 | Jurach Jr James Edward | Method and System for Providing Translated Dynamic Web Page Content |
KR20100091923A (en) * | 2009-02-10 | 2010-08-19 | 오의진 | Method of servicing translation of web page written in many languages |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2593884A2 (en) * | 2010-07-13 | 2013-05-22 | Motionpoint Corporation | Dynamic language translation of web site content |
-
2011
- 2011-02-24 NL NL2006294A patent/NL2006294C2/en not_active IP Right Cessation
-
2012
- 2012-02-24 WO PCT/NL2012/000016 patent/WO2012115507A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040167784A1 (en) * | 2003-02-21 | 2004-08-26 | Motionpoint Corporation | Dynamic language translation of web site content |
US20090157381A1 (en) * | 2007-12-12 | 2009-06-18 | Microsoft Corporation | Web translation provider |
US20090192783A1 (en) * | 2008-01-25 | 2009-07-30 | Jurach Jr James Edward | Method and System for Providing Translated Dynamic Web Page Content |
KR20100091923A (en) * | 2009-02-10 | 2010-08-19 | 오의진 | Method of servicing translation of web page written in many languages |
Also Published As
Publication number | Publication date |
---|---|
WO2012115507A1 (en) | 2012-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8942973B2 (en) | Content page URL translation | |
US10394962B2 (en) | Methods and systems for the dynamic creation of a translated website | |
US20170109454A1 (en) | Identifying an industry associated with a web page | |
US20140331124A1 (en) | Method for maintaining common data across multiple platforms | |
EP2680159A1 (en) | Dynamic language translation of a message | |
US20050278626A1 (en) | Converting the format of a portion of an electronic document | |
US20080040094A1 (en) | Proxy For Real Time Translation of Source Objects Between A Server And A Client | |
US9251223B2 (en) | Alternative web pages suggestion based on language | |
US20110307484A1 (en) | System and method of addressing and accessing information using a keyword identifier | |
US9846686B2 (en) | Methods for extending a document transformation server to process multiple documents from multiple sites and devices thereof | |
NL2006294C2 (en) | Website translator, system, and method. | |
US10089395B2 (en) | Third party content integration for search engine optimization | |
CN104021154B (en) | A kind of method and apparatus scanned in a browser | |
US20210073482A1 (en) | Translation system | |
US20170109442A1 (en) | Customizing a website string content specific to an industry | |
JP5525623B2 (en) | Remote printing | |
CN106156128B (en) | Method and device for realizing multi-language and multi-domain name service of website | |
US20180121410A1 (en) | Regular expression searching | |
US20160043993A1 (en) | Optimized domain names and websites based on incoming traffic | |
US20210357465A1 (en) | Method and System for High Speed Serving of Webpages | |
KR20070088193A (en) | Searching method using the address input area of web browser | |
CN101158974A (en) | Method and device for quoting resource | |
JP2007011663A (en) | Information processor, information processing method, and information processing program | |
TW201207638A (en) | Device and method for processing customized Web pages | |
WO2009147879A1 (en) | Viewing system, plug-in program, and introduction program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
SD | Assignments of patents |
Effective date: 20140225 |
|
HC | Change of name(s) of proprietor(s) |
Owner name: TOLQ.COM IP B.V.; NL Free format text: DETAILS ASSIGNMENT: CHANGE OF OWNER(S), CHANGE OF OWNER(S) NAME; FORMER OWNER NAME: EXVO.COM GLOBALIZE B.V. Effective date: 20180212 |
|
MM | Lapsed because of non-payment of the annual fee |
Effective date: 20230301 |